A Comprehensive Survey on Computer-Vision Object Detection, Segmentation, Tracking, and Feature Extraction

(1)

A Comprehensive Survey on Computer-Vision Object Detection, Segmentation, Tracking, and Feature Extraction

Guru Prasad M. Bhat¹, Dr. Nagaraj G. Cholli²

1Research Scholar RV College of Engineering Visvesvaraya Technological University

Belagavi

[email protected]

2Associate Professor Dept. of ISE.

RV College of Engineering Bengaluru

[email protected]

Abstract

Computer-Vision [CV] applications like surveillance, autonomous vehicle navigation involves object detection and tracking using multiple features[1]–[4]. Video surveillance technology can be effectively implemented in a real-time application like public security, logistics management, and others, which involves continuous monitoring of the environment. In a video sequence, moving object detection from a video is the primary step which helps in object tracking and behavior understanding. Video surveillance in a dynamic scene with objects like vehicles and humans is a challenging task in the CV.

Moving object detection involves designing efficient video surveillance algorithms for complex environments like the variation in illumination, brightness and environmental effects. Capturing and studying the moving object segmentation is an important task in vision-based object movements. Monitoring human behavior is an important task in a security surveillance system, hence it has become an active research area due to the need for such systems in civic areas. Object tracking based algorithms such as point tracking, Kernel tracking, and Silhouette tracking detect immobile background objects when the camera is fixed and the change in illumination condition is sluggish, and they separate foreground objects from the present frame. Static background surveillance systems are monitored by individuals, viewing several screen display through various camera feeds.

Dynamic background detection is very difficult. Manual detection of the dynamic scene is very less effective. Hence leads to automating the entire process. Processing of a large amount of data in a video sequence which makes manual initialization cost-prohibitive.

Research in CV has addressed semi-automatic and automatic methods to automate the contents of video data, to aid human operators. Advances in computation and communication techniques have increased the research interest in automating video content analysis.

Research interest in Object detection, segmentation, tracking, and feature extraction areas have grown vastly over the years and hence this paper presents a summary of these fields. It analyses the literature according to various CV disciplines, the successes demonstrated, and the applicability of the research to real-world problems.

(2)

Keywords: CV, moving object, object segmentation, video sequence, tracking, multiple features, automatic methods.

1. Introduction

The research on the CV in the 1980s focused on to perform quantitative image analysis[5][6]–[9]. The research was aimed at developing and refining mathematical algorithms for image and video processing. The research was also performed to improve the ability to detect objects in images. In the subsequent period, researchers continued to discover relevant techniques, which further improved techniques in the CV field[10]–

[13].

CV starts with a video input where frames are extracted from the video. De-noise image from obtaining video frames. De-noised frames are used for moving object recognition.

Moving object recognition involves various algorithms like Background Subtraction [BS]

algorithm, spatial segmentation algorithms, and others. In the next step, object pixels are used to identify and differentiate stationary and moving objects. Further, detect and segment single and multiple objects. The region of Interest [ROI] of the objects tracks the motion of the objects. After identifying moving objects, analysis and information extraction process is done. Moving object tracking involves steps as in Fig. 1.

Fig. 1 Computer Vision Steps

CV involves challenges[14] in object detection, segmentation, and tracking[15][16].

Brightness is one such challenge, it varies drastically from indoor to outdoor environments. Brightness varies even during the day and night. Weather conditions also a vital factor which changes the brightness. The shadow of an object due to illumination is as a challenge that needs to be identified and distinguished for efficient object detection.

The position of the object is a challenge need to be matched in every frame of video taken at a different angle. Object rotated at a different angle and mirrored, do pose a challenge in object detection and tracking. Occlusion of object increase difficulties of object identification and separation in an image. Scaling of objects to a different size is also to be addressed in an efficient CV technique.

In this paper, a detailed classifications of CV for Moving Object tracking and its advancements is surveyed.

(3)

Fig. 2 classification of CV for Moving Object detection, classification and tracking.

2. Related Work

A. Moving object Detection:

Recognition of moving articles in video streams is the main pertinent advance of data and BS[14][15] is an extremely mainstream approach for frontal area division. Various works include planning of the effective video observation framework in complex conditions. A distinctive BS technique to overcome the challenges of brightening varieties, background clutter and shadows have been discussed.

Fig. 3 Background Subtraction

BS algorithm categorized as Modelling, Initialization, Maintenance, and detection.

Background modelling is further classified into different methods as Basic Background modelling, Statistical Background modelling, Fuzzy background modelling, Neural network modelling, Background modelling by clustering and Background estimations.

Based on the above methods many BS algorithm data collections are available in the BGS

(4)

library. Each method has its own merits and demerits as identified by experimental results. BS is formulated as shown below.

Running Gaussian average[15] method, it is one of the fastest methods with less memory requirement. But, the computational load is high, the update rate is less than that of the frame rate. It has low accuracy. It does not handle multimodal backgrounds. A temporal median filter is another approach which increases the background model stability. Computation requires a buffer with a recent pixel value and accuracy of this algorithm is low. Sequential kernel density approximation memory requirement of the algorithm is low, but, the computational cost is high. A mixture of Gaussians algorithm is more accurate, but extremely costly and more complex. Frame difference[13] is a simple algorithm but we need to predefine the threshold so may not be accurate as expected. Kernel density[15] estimation is accurate and gives stable results, but slower than averaging techniques and have high memory requirements. Probability Density Function of color for Gaussian Model is stated below.

In Table.1 Major methods of Object Detection its Pros and cons are discussed

Method Precision Processing

Time Pros Cons

Gaussian

of Mixture Fair Sizeable

time Low storage requisite. It does not cope with multi-modal background.

Approximate

Median Low to Fair Sizeable time

It does not require sub sampling of frames for creating an adequate background model.

Computation requires a buffer with the recent pixel values.

Optical Flow Fair High It is helpful in extracting complete movement information.

Complex and lengthy calculations.

Frame

Differencing High

Low to Sizeable time

simple Method. Perform well for static background.

It requires a background without moving objects.

Table 1 Methods of Object Detection with its Pros and cons

(5)

B. Object segmentation:

Object segmentation[13], [18]–[23] has different methods with its advantages and disadvantages. The image can be split consequently according to the requested resolution, in Region Splitting and merging technique because the number of splitting stages is settled by the algorithm. Image is split using the mean or variance of a segmented pixel value. The merging criteria may be different from splitting criteria.

Fig. 4 Object Segmentation

Fig. 5 Segmentation Process

In Hierarchical algorithm approach, the process and associations of Hierarchal Clustering [HC] can be valued by scrutinizing the dendrogram. The outcome of HC shows a high association with the features of the original record. It calculates the distance between each pattern instead of calculating the centroid of clusters[25]. But, HC comprises a detailed level of clustering, which requires high computation time. K-Means clustering algorithm[25] comparatively easy to compute and take less computation time than HC.

But the result is sensitive to preliminary random centroids. Limitation of the K-Means clustering algorithm is that it cannot show clustering details, unlike HC. Mean Shift algorithm is a multipurpose tool for feature space analysis, hence suitable for feature spaces. But, the factor like kernel bandwidth can control its output, hence computation time is quite long. In the Watershed algorithm approach, the boundaries of each region are

(6)

continuous hence advantageous. But, the segmentation result has an over-segmentation problem. The algorithm is comparatively time-consuming.

In region-based segmentation technique[26], Change Detection Mask [CDM] by local thresholding is a method which reduces noise and false detection, and boundaries are detected accurately. But, computation is time-consuming. Spatial segmentation method has increased computational efficiency and noise is reduced. But it’s useful for still background and work are tedious. Intelligent scissors method has a good computational time and more efficient. But, not accurate enough for long period process. Reference Frame Subtraction Double Difference Algorithm [RFSDDA] method is robust to background variations and noise also detected. But, computation is complex and results in inaccurate segmentation. In a Modified-BS method, shadows are detected and eliminated.

But, the size of the video should be limited, and defined formats can only be used. The computational complexity of region-based segmentation technique is high. In Boundary based technique, Optical flow[27] method has high computational velocity, robust and accurate. But, cases of False detection are more. In Contour linkage method, computational load is reduced. But, inappropriate for the complex background. This technique has low computation complexity. In Region and boundary-based technique, Frame difference method has efficient foreground detection and efficient edge detection.

But, Frame difference method does not detect and eliminate shadows and fails in case of dynamic background. Hierarchical Markov Random Field [MRF] method detects edges and their orientations more efficiently. But, sensitive to noise. In Combination of BS and Sobel filtering method, object boundary is refined accurately. But, not appropriate for the slow-moving object segmentation, the background is not preserved. This technique has High computation complexity. In Table.2 comparative study of object classification methods are discussed.

Table 2 Comparative study of object classification methods

C. Object tracking:

Moving Object tracking[20], [22], [24], [28]–[33] is to infer position, velocity, and scale of an object from a video sequence. Object tracking is classified as Point tracking, Kernel tracking, and Silhouette tracking techniques. In point tracking techniques[15], the Kalman filter method detects a single object and it has an optimal solution. But it cannot

Method Precision Processing

Time Pros Cons

Shape-

Based Moderate Low

It is a Simple pattern matching approach can be applied with suited model.

It does not work in dynamic situations and is unable to determine internal movements well.

Motion-

Based Moderate High It does not require predefined pattern templates

It struggles to identify a non-moving human.

Texture-

Based High High

It provides improved quality with the increase in additional computational time

Requires a pre-defined template or classifier for the human area that must be obtained through training.

Requires significant processing time to detect the human object.

Color-

Based High High

It creates a Gaussian Mixture model to describe the color distribution within the sequence of images and to segment the images into background and objects

Color based approach depends on image recording device and color alone does not provide enough information for classification.

(7)

differentiate occluded objects. Particle Filter method detects multiple objects, optimal and can detect occluded objects. Multiple hypothesis tracking methods do multiple objects tracking with occlusion detection and optimal solution.

Fig. 6 Object Tracking

Kernel tracking[28] is another technique of object tracking. Simple template matching is a method which tracks single objects and does partial occlusion detection. Mean shift method also does single object tracking with partial occlusion detection. Support Vector Machine does single object tracking with partial occlusion detection but requires training.

Layered Based tracking is multiple object tracking, but occlusion detection can’t be done in this method.

In Silhouette tracking[33] technique, Shape matching is a process which tracks a single object and does not require any training. Whereas the Contour matching method does a multiple object tracking which is optimal and required training, but, does not track occluded objects. In Table. 3 Different methods of Tracking methodology with its pros and cons are discussed.

Fig. 7 Object tracking classification

(8)

Method Pros Cons

Point tracking methodology

Kalman Filter

Do single object tracking.

It has optimal tracking,

Cannot detect occlusion

Multiple Hypotheses Tracking

Do multiple object tracking.

It has optimal tracking, can detect occlusion.

Complexity and computational growth

Particle Filter

It has optimal tracking, can detect occlusion.

With uninformative sensor readings, samples tend to congregate.

High number of particles needed.

Computationally expensive

Kernel tracking methodology

Template Matching

It does single object tracking.

And does not require training.

Detects partial occlusion.

Mean Shift

Support Vector Machine

And requires training.

Layering based tracking

And does not require training. Detects occlusion.

Compensation of

the background motion needs to be done for accuracy.

Silhouette tracking methodology

Shape matching

Cannot detect occlusion.

Contour matching

And does not requires training.

Detects occlusion.

Knowledge about the desired contour shape is needed beforehand. If the tracking object appearance is drastically changed during occlusion the

tracking precision is lost.

Table 3 Tracking methodology comparison with its pros and cons.

D. Multiple Feature Extraction:

Feature identification[34]–[36] and meaningful data extraction from an image is a process of object detection and tracking in the CV. Feature enumerates relevant information about Image and assists in the process. It can be classified as Pixel – Level feature, Local Feature, and Global feature. Application dependent features are classified like human Face, fingerprint and vein. Important forms of features to be considered while trying to identify the signs are spatial, temporal and textural. A feature vector is expected to be similar and accurate in all the images having a similar object. The feature should be unique across the image, so that feature is easily separable from a set of images.

(9)

Feature vector extraction algorithms are FAST (Features from accelerated segment test), Robust Independent Elementary Features (BRIEF)[37], ORB (Oriented FAST and rotated BRIEF), SIFT[38] (Scale Invariant Feature Transform), SURF (Speeded up Robust Feature), etc. SIFT and SURF feature algorithms are most widely used to its robustness.

ORB is quickest in calculation while SIFT is best in the wide range of situations. For a complex situation when the edge of a pivot is corresponding to 90 degrees, ORB and SURF outflank SIFT and in the flashy pictures, ORB[38] and SIFT demonstrate relatively comparable exhibitions. In ORB, the highlights are for the most part amassed in articles at the focal point of the picture while in SURF, SIFT and FAST key point identifiers are conveyed over the picture. Because of high-speed computation, FAST algorithm appropriate for real-time video processing.

Segmenting objects with shadow or ghost effect still needs more research and should be included in object segmentation method. Tropical areas have more Rain which reflects colors, affects object segmenting and tracking and still needs more research work. In below Table. 4 pros and cons of feature matching techniques are discussed.

Methods Pros Cons

FAST It is efficient and finds reasonable corner key points. It does not produce a measure of cornerness.

BRIEF Its performance is similar to SIFT in many respects, including robustness to lighting, blur, and perspective distortion

It poorly per forms if there is an in-plane rotation.

SIFT

Solves the image rotation, affine

transformations, intensity, and viewpoint change in

matching features Low noise immunity.

ORB In ORB, a rotation matrix is

computed using the orientation of patch and then the BRIEF descriptors are steered according to the orientation.

Relatively immune to Gaussian image noise.

SURF Its low computational cost. Moderate noise immunity.

Table 4 Feature matching techniques, pros and cons.

Conclusion:

The CV has developed vastly as a field of importance in research and application[39]. In this article, various algorithms of object detection, segmentation, tracking and feature extraction with the pros and cons as part of the survey have been discussed.

In object detection, Gaussian of mixture is a probabilistic model used to estimate pixel density. Approximate median filter calculates pixel value difference in each frame.

Optical flow uses flow vectors which are computationally complex and it is useful where camera is moving with the object. At last, Frame differencing is a simple and low complexity approach towards object detection, but suffer from single-pixel value threshold which limits its application.

In object segmentation shape-based method needs prior knowledge which is a limitation to identify the deformed object. Motion-based method prominently and limited used in vehicle and human movement detection and surveillance. Texture based algorithm do natural and artificial text segmentation and classification applied for the data retrieval process. Color-based segmentation method based on the color value of the pixels of an image, whereas abrupt changes in the intensity value of pixels doesn't yield the required result.

In object tracking, we have a major classification of Point tracking, which is used when detected objects are represented as points. Kernel tracking is computing the motion of an object from one frame to another, where kernel refers to the shape and appearance[40]. In

(10)

Silhouette tracking a dark shape seen against a light surface is matched using in Complex shaped and/or nonrigid objects.

Multiple feature extraction has 2 robust feature detection and extraction algorithms, namely SURF and SIFT comparatively with other algorithms. SIFT and SURF descriptors are quite robust to noise and error detection and keep invariant to photo-metric changes.

The Computing of SIFT operator takes so long which leads to low efficiency as compared to SURF which has good efficiency with quick computation speed.

After carefully analyzing various algorithms of object detection, segmentation, tracking, it can be concluded that there is a need for a generic algorithm which has improved efficiency and hardware implementable for better application in the CV.

Acknowledgments

We thank all who have supported knowingly or unknowingly for this survey.

References

[1] I. Pitas, Digital image processing algorithms and applications. John Wiley & Sons, 2000.

[2] B. Jahne, “Digital image processing: concepts, algorithms, and scientific applications,” Berlin, Heildelb. Springer-Verlag, vol. 570, 1997.

[3] K. Preston, M. J. B. Duff, S. Levialdi, P. E. Norgren, and J. I. Toriwaki, “Basics of cellular logic with some applications in medical image processing,” Proc. IEEE, vol. 67, no. 5, pp. 826–856, 1979.

[4] L. Anuj and M. T. G. Krishna, “Multiple camera based multiple object tracking under occlusion: A survey,” in 2017 International Conference on Innovative Mechanisms for Industry Applications (ICIMIA), pp. 432–437, 2017.

[5] M. Yachida and S. Tsuji, “Industrial Computer Vision in Japan,” vol. 75, 1980.

[6] W. E. L. Grimson, “An implementation of a computational theory of visual surface interpolation,”

Comput. Vision, Graph. Image Process., vol. 22, no. 1, pp. 39–69, 1983.

[7] D. J. Granrath, “The role of human visual models in image processing,” Proc. IEEE, vol. 69, no. 5, pp. 552–561, 1981.

[8] M. J. B. Duff and S. Levialdi, Languages and architectures for image processing, vol. 236.

Academic Press London, 1981.

[9] A. K. Jain, “Advances in mathematical models for image processing,” Proc. IEEE, vol. 69, no. 5, pp.

502–528, 1981.

[10] S. Sladojevic, M. Arsenovic, A. Anderla, D. Culibrk, and D. Stefanovic, “Deep Neural Networks Based Recognition of Plant Diseases by Leaf Image Classification,” Comput. Intell. Neurosci., vol.

2016, http://dx.doi.org/10.1155/2016/3289801, 2016.

[11] P. Patil, P. Shettar, P. Narayankar, and M. Patil, “An efficient method of detecting exudates in diabetic retinopathy: Using texture edge features,” International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 1188–1191, 2016.

[12] G. Mahendran, R. Dhanasekaran, and N. D. KN, “Morphological process based segmentation for the detection of exudates from the retinal images of diabetic patients,” International Conference on Advanced Communication Control and Computing Technologies (ICACCCT), , pp. 1466–1470, 2014.

[13] I. Engineering, “DETECTION AND TRACKING OF MOVING OBJECT IN VISUAL SURVEILLANCE,” pp. 3711–3719, 2013.

[14] R. Mech, “Detection of Moving Cast Shadows for Object Segmentation,” vol. 1, no. 1, pp. 65–76, 1999.

[15] R. K. Rout, “Video Object Segmentation: A Survey,” Appl. Comput. Informatics, vol. 8, no. 1, pp. 1–

18, 2014.

[16] P. S. Ganar, R. Shende, P. Hu, J. Mirza, and R. Sohail, “Image Processing Technique Based Color Image Object Tracker,” vol. 7, no. 3, pp. 5735–5737, 2017.

[17] B. Tamersoy, “Background Subtraction,” 2009.

[18] Y. Benezeth et al., “Comparative study of background subtraction algorithms To cite this version:

HAL Id : inria-00545478,” 2012.

(11)

[19] S. Chien and L. Chen, “Efficient Moving Object Segmentation Algorithm Using Background Registration Technique,” vol. 12, no. 7, pp. 577–586, 2002.

[20] P. Spagnolo, “Moving object segmentation by background subtraction and temporal analysis,” no.

May, 2006.

[21] D. Bobkov, S. Chen, M. Kiechle, S. Hilsenbeck, and E. Steinbach, “Noise-resistant Unsupervised Object Segmentation in Multi-view Indoor Point Clouds,” 2017.

[22] C. Kim and J. Hwang, “Fast and Automatic Video Object Segmentation and Tracking for Content- Based Applications,” vol. 12, no. 2, pp. 122–129, 2002.

[23] A. Tah, S. Roy, P. Das, and A. Mitra, “Moving Object Detection and Segmentation using Background Subtraction by Kalman Filter,” vol. 10, no. May, 2017.

[24] Y. Wu, S. Member, X. He, and T. Q. Nguyen, “Moving Objects Detection with Freely Moving Camera via,” vol. 8215, no. c, pp. 1–13, 2015.

[25] L. K. Lee, S. C. Liew, and W. J. Thong, “A review of image segmentation methodologies in medical image,” in Advanced computer and communication engineering technology, Springer, pp. 1069–

1080, DOI:10.1007/978-3-319-07674-4_99, 2015.

[26] A. Mathematics, “COMPARATIVE STUDY OF IMAGE SEGMENTATION TECHNIQUES ON CHRONIC KIDNEY,” vol. 118, no. 14, pp. 235–239, 2018.

[27] A. Nara, C. Allen, and K. Izumi, “Surgical Phase Recognition using Movement Data from Video Imagery and Location Sensor Data,” pp. 310–316, In book: Advances in Geocomputation, DOI:

10.1007/978-3-319-22786-3_21, 2017.

[28] D. Comaniciu and P. Meer, “Kernel-Based Object Tracking,” pp. 1–30,IEEE, DOI:

10.1109/TPAMI.2003.1195991 , 2003.

[29] M. Thanh, N. Truong, and S. Kim, “Parallel implementation of color - based particle filter for object tracking in embedded systems,” Human-centric Comput. Inf. Sci., pp. 1–13, 2017.

[30] J. V. C. I. R, Z. Li, S. Gao, and K. Nai, “Robust object tracking based on adaptive templates matching via the fusion of multiple features q,” J. Vis. Commun. Image Represent., vol. 44, pp. 1–20, 2017.

[31] C. Y. R. V. A. Prisacariu and O. K. I. D. Reid, “Real-Time Tracking of Single and Multiple Objects from Depth-Colour Imagery Using 3D Signed Distance Functions,” Int. J. Comput. Vis., 2017.

[32] T. Journal, M. Amherst, T. Roorkee, and T. Bhubaneswar, “Moving object detection using modified temporal differencing and local fuzzy thresholding,” no. July, 2016.

[33] B. Rosenhahn et al., “A Silhouette Based Human Motion Tracking System,” pp. 1–33, 2005.

[34] D. Mistry, “Comparison of Feature Detection and Matching Approaches : SIFT and SURF,” vol. 2, no. 4, pp. 7–13, 2017.

[35] J. L. Miranda, B. D. Gerardo, and B. T. I. I. I. Tanguilig, “Pest Detection and Extraction Using Image Processing Techniques,” Int. J. Comput. Commun. Eng., vol. 3, no. 3, pp. 189–192, 2014.

[36] E. Salahat and M. Qasaimeh, “Recent Advances in Features Extraction and Description Algorithms : A Comprehensive Survey.”

[37] J. Xu, H. Chang, S. Yang, and M. Wang, “Fast feature-based video stabilization without accumulative global motion estimation,” IEEE Trans. Consum. Electron., vol. 58, no. 3, 2012.

[38] E. Karami, S. Prasad, and M. Shehata, “Image Matching Using SIFT , SURF , BRIEF and ORB : Performance Comparison for Distorted Images.”, Conference: Newfoundland Electrical and Computer Engineering Conference, 2015

[39] “Global Video Surveillance Market Analysis and Forecast (2017-2023) - Focus on Ecosystem (Camera, Monitor, Storage, Software and Service) and Application (Infrastructure, Commercial, Residential, Industrial, Institutional and Others),” BIS Research, 2017.

https://www.researchandmarkets.com/research/jwbnbg/global_video?w=4. Accessed on 14 November 2019.

[40] S. D. Lin, J.-J. Lin, and C.-Y. Chuang, “Particle filter with occlusion handling for visual tracking,”

IET Image Process., vol. 9, no. 11, pp. 959–968, 2015.

(12)

Guru Prasad M. Bhat, obtained his B.E in Electrical and Electronics Engineering in Visvesvaraya Technological University, Belagavi, Karnataka, India. M.Tech in VLSI Design from Visvesvaraya Technological University, Belagavi, Karnataka, India. Currently persuing Ph.D from Visvesvaraya Technological University, Belagavi.

He is presently doing research on Image Processing and Computer Vision.

His has presented his research work in National Conferences and published papers in National and International Journals. His area of interest is in Image Processing, VLSI design, and Computer Vision. He is a Life time member in ISTE.

Dr.Nagaraj G Cholli, obtained B.E in Computer Science &

Engineering in Visvesvaraya Technological University, Belagavi, Karnataka, India. M.TECH in Computer Science & Engineering in IIT- Roorkee India, Ph.D in Visvesvaraya Technological University, Belagavi , India., completed in the year 2016 in the field of “Software Aging and Rejuvenation”.

He is Presently working as Associate Professor at Department of Information Science and Engineering, R.V College of Engineering, Bengaluru, India. Karnataka with 13+ years of Research, Industry & Teaching experience in India & Abroad.

Research Guidance: Total no. of PG guided: 26 and Total no. of Ph.D guiding: 06. He is working on consulting & funded projects approved by government of Karnataka & India. He has Published more than 30 Research papers in various National & International Conferences/Journals and Filled 6 patent. He is also Life time member in ISTE, CSI, Sciei, IAENG, IRED professional societies.