Motion Analysis of Moving Object Detection and Tracking in UAV Images,Video

(1)

208 Motion Analysis of Moving Object Detection and

Tracking in UAV Images/Video

Mr. Ajinkya S. Mahajan

Prof. Y. M. Kurwade

Dr. V. M. Thakare

SGBAU, Amravati SGBAU , Amravati SGBAU , Amravati

India India India

[email protected] [email protected] [email protected]

Abstract

—

Motion analysis and detection for moving object from UAV airborne images is still an unsolved issue in computer vision research field due to fast unexpected motion of object and UAV, low resolution, noisy imagery, cluttered background, low contrast and small object size. The main reason for the inability to handle motion is the weakness of existing approaches for moving object detection. This paper focused on different motion tracking and detection techniques i.e Integer-Pel Search Method, Graph Cut Segmentation Framework, Spatiotemporal Oriented Energy Tracking, Variational Method and Temporal Clustering. This paper presents critical analysis of the various methods used for motion analysis which states lack of relevancy with motion analysis along with some unexplained problems need to be solved for optimum performance of moving objects detection from UAV aerial images.

Keyword: Motion Analysis; Integer-Pel Search Method; Graph Cut Segmentation Framework; Spatiotemporal Oriented Energy Tracking; Variational Method; Temporal Clustering.

I. INTRODUCTION

Motion analysis is the process that constructs a description of image by determination and measurement of image differences in terms of the two dimensional vector field initiated by the displacement and change of shape of the objects in the scene. With the advance of technology, unmanned airborne vehicles (UAVs) have played a vital role in modern wars and industries. Moving object detection in aerial video as the establishment of higher targets, such as tracking and object recognition, is essential for UAV intelligence. In dissimilarity to applications with fixed cameras, such as traffic monitoring and building surveillance, aerial scrutiny has the advantages of higher mobility and larger surveillance scope. Meanwhile, further challenges are concerned in aerial video, such as changing background and low resolution.

Therefore, much attention has been rewarded to moving object detection in airborne video. This paper, discusses five methods i.e Integer-Pel Search Method, Graph Cut

Segmentation Framework, Spatiotemporal Oriented Energy Tracking, Variational Method, Temporal Clustering. Proper motion parameter evaluation for both moving object and camera motion along with UAV vehicle motion can be a good bridge to solve other issues. This paper presents review of existing methods stating the relevancy of motion analysis which can extends the research field of computer vision for moving object detection from UAV aerial images in terms of discovering appropriate methods for motion analysis.

II. BACKGROUND

Multi-object tracking (MOT) is of great importance for several computer vision tasks with application such as surveillance, traffic safety, automotive driver assistance systems, and robotics. In detection-based MOT methods, the goal is to determine the trajectories and identities of goal instance throughout an image sequence using the detection results of each frame as observations. The typical motion estimation (ME) consists of three major steps, including spatial-temporal prediction, integerpel search, and fractional-pel search. The integer-pel search, which seek the best matched integer-pel position within a search window, is considered to be vital for video encoding. It occupy over 50% of the overall encoding time (when adopting the full search scheme) for software encoders, and introduce outstanding area cost, memory traffic, and power consumption to hardware encoders. Author Ling Li et al. [1] find that video sequences (especially high-resolution videos) can often be encoded successfully and powerfully even without integer-pel search.

(2)

209

both the spatial structure and dynamics of a target in an incorporated fashion, while concurrently offering robustness to illumination variations. Specifically, the proposed feature is copied from spatiotemporal energy dimensions that are computed by filtering in 3D image spacetime. These spatiotemporal energy measurements detain the underlying restricted spacetime orientation structure of the target across multiple scales. Author Peter Ochs et al. [4] suggest working with a paradigm that starts with semi-dense motion cues first and that fills up textureless areas afterwards based on color.

Temporal segmentation of human motion into plausible motion primitives is essential to accepting and building computational models of human motion. Several issues contribute to the challenge of discover motion primitives: the exponential nature of all possible movement combinations, the variability in the sequential extent of human actions, and the complication of representing articulated motion. Author Feng Zhou et al. [5] pose the problem of learning motion primitives as one of temporal clustering, and derive an unsupervised hierarchical bottom-up framework called hierarchical aligned cluster analysis (HACA).

This paper introduces five MotionAnalysis Techniques i.e , Integer-Pel Search Method, Graph Cut Segmentation Framework, Spatiotemporal Oriented Energy Tracking, Variational Method, Temporal Clustering Section I

Introduction. Section II discusses Background. Section III

discusses Previous Work. Section IV discusses Existing Methodologies. Section V Discusses Attributes and Parameters and how these are affected on images. Section VI

proposed method and outcome result possible. Finally section

VII Conclude this review paper.

III. PREVIOUS WORK DONE

The most critical and time-consuming part of video encoding is the block-matching motion estimation (ME) redundancies in video sequences. Technically, ME searches the displacement of the best matched block in reference frames (denoted as the reference block) as motion vector (MV) for blocks in the current frame suggested by Author Ling Li et al. [1] . Through encoding the MVD (the difference between MV and its prediction) and the prediction residual (the difference between the contents of the current block and the reference block), the encoders need not to directly record the content of the current block, thus can greatly save bits. Author David Corrigan et al. [2] deals with film tear; the repeated correction of which has not been address previously in the community. The two main problems resulting from film tear are the relative displacement of the regions at either side of the tear and the destruction of image data along the tear boundary. The destruction of image data requires a missing-data treatment solution that is similar to the restoration of film degraded by dirt and sparkle. Previously, tear has been restored by manually joining film sections using a transparent adhesive tape (a process known as splicing) or by using image-editing software to manually realign the two regions. These methods can be thorough and are prone to failures such as subsequent

stretching and damage to the splice. The breadth of applicability of these features within the field of visual tracking is demonstrated by their instantiation within three different tracking paradigms that are representative of the various basic types of region trackers in the field. Qualitative and quantitative empirical evaluation on a challenging suite of videos demonstrates the power and applicability of the proposed illustration to tracking, as it outperforms other commonly-used features crossways all tracking paradigms. furthermore, it is shown that overall high tracking accuracy can be obtained with this proposed representation by author Cannons et al. [3], as spatiotemporal oriented energy

instantiations are shown to break some current, state-of-the-art trackers. IT has been shown that bottom-up segmentation based on color can effectively provide so-called superpixels – little, homogenous regions, which are actively used in many vision applications, But what about the segmentation of whole objects or meaningful parts of objects? A person could wear clothes of very different color. How can a bottom-up approach decide which of these regions must be grouped together? Top-down object priors can determine such ambiguity, but based on which data can these priors be learned in the first place? So author Peter Ochs et al. [4] reemphasize the value of motion and the Gestalt principle of common fate Motion vectors are typically more consistent within an object region than color and texture. Consequently, ambiguities in color based segmentation fade away as soon as objects move. study with formerly blind people indeed show that learning from moving objects is easier than learning from static ones system that can detect, recognize, and synthesize human motion are of interest in both research and industry due to the large number of potential applications in virtual reality. The inherent difficulty of temporally decayed human motion stems from the large number of possible movement combinations, a relatively large range of temporal balance for different behaviours, the irregularity in the periodicity of human actions, and the intraperson motion variability. To address these challenges, author Feng Zhou et al. [5] frames the problem of hierarchical temporal breakdown of human motion as an unsupervised learning problem, and proposes a hierarchical aligned cluster analysis (HACA). HACA is a simplification of kernel k-means (KKM) and spectral clustering (SC) for time series clustering and embedding.

IV. EXISTING METHODOLOGIES

Many image/video motion analysis detection and tracking methods has been implemented over the most recent numerous decades. There are different methodologies that are implemented for motion segmentation/estimation i.e Integer-Pel Search Method, Graph Cut Segmentation Framework, Spatiotemporal Oriented Energy Tracking, Variational Method, Temporal Clustering.

(3)

210

The typical motion estimation (ME) consists of three main steps, including spatial-temporal prediction, integerpel search, and fractional-pel search. The integer-pel search, which seeks the best matched integer-pel position within a search window, is considered to be crucial for video encoding. It occupies over 50% of the overall encoding time (when adopting the full

search scheme) for software encoders, and introduces remarkable area cost, memory traffic, and power consumption to hardware encoders.

Fig 1: Motion estimation flowcharts. (a) Typical ME flow. (b) ME without integer-pel search.

.

As shown in Fig. 1 in which the fractional-pel search is carried out directly around the initial search position, MVpred the typical process of ME consists of three steps. The first step, which is named spatial-temporal prediction, offers an initial search position (MVpred) for ME according to the spatial-temporal correlation of MVs. The second step is integer-pel search, which finds out the best integer-pel position (denoted as MVint ) around MVpred within a search window. Finally, fractionalpel search is applied around MVint to get the best fractionalpel position (MVbest ). Among the three steps, integer-pel search is widely considered to be the most important one, since it almost determines the final reference position.

B. Digital Restoration:

The digital restoration of films damaged by tear as well as causing local image data loss, a tear results in a noticeable relative shift in the frame between the regions at either side of the tear boundary. The detection of torn frames can be performed via a visual inspection of the sequence. Therefore, in order to digitally restore a detected torn frame, fig.2 shows the steps involved; following tasks must be performed: 1) Tear Delineation, when detected torn frames are segmented into two regions divided by the tear boundary;

2) Displacement Correction, when the relative displacement between the segmented regions is corrected;

3) Recovery of the damaged image data.

Fig. 2: Steps involved in the restoration of torn frames.

The algorithm for the segmentation of torn frames is derived from interactive graph-cut methods used for still-image segmentation. After tear delineation, the relative displacement between the regions is estimated using a modified global-motion estimation technique, which estimates the global motion of each region with respect to an unimpaired reference frame. Once the relative displacements are corrected, the third task becomes a standard missing-data problem and can be solved using many of the existing dirt-and-sparkle removal algorithms.

C. Spatiotemporal Oriented Energy Features to Region Tracking:

A spatiotemporal oriented energy representation, similar in spirit to representations that have been demonstrated with success in other areas of computer vision, is uniquely applied to visual tracking. This representation offers an extremely rich target representation, capturing both appearance and motion cues, while simultaneously providing significant robustness to illumination effects. Theoretical methods are developed for instantiating the SOE features in three disparate tracking architectures.

Fig. 3: spatiotemporal oriented energy representation captured for a single video frame. (Top Left)

(4)

211

left to right; the energy channels roughly correspond to horizontal structure, vertical structure, and leftward motion. In addition to the 11-way feature set comparison, five additional state-of-the-art tracking systems are evaluated. When the entire set of 16 algorithms are evaluated on a suite of nine challenging videos, one of the proposed SOE-based trackers yields best overall tracking accuracy. The other two trackers based on SOEs are among the top five best overall systems and show superior performance to recent strong trackers.

Fig. 4: SOE histogram for the target region in Fig.3.

D. Motion Segmentation by Long Term Video Analysis:

Occlusion detection, tracking has to be stopped as soon as a point gets occluded. This is very important, as otherwise the point trajectory will share the motion of two different objects. Occlusion detection is a common problem, considered especially in disparity estimation, but recently has appeared also more often in conjunction with optical flow. In tracking, occlusion is usually detected by comparing the appearance of the local neighborhood of the tracked point over time. In contrast, we detect occlusions by verifying the consistency of the forward and the backward flow, as illustrated in Fig. 5. In a non-occlusion case, the backward flow vector points in the inverse direction of the forward flow vector. If this consistency requirement is not satisfied, the point is either getting occluded at t + 1 or the flow was not correctly estimated. Both are good reasons to stop tracking this point at t. Since there are always some small estimation errors in the optical flow, grant a tolerance interval.

Fig.5: Forward-backward matching criterion.

Each pixel in frame t is mapped to frame t + 1 via the optical flow vector wt. The backward map wt at the subpixel position is determined by bilinear interpolation. Concatenating the two mappings should result in approximately the original position.

E. Temporal Clustering of Motion:

(5)

212

Algorithm 1: DPSearch

DP-based algorithm to exhaustively examine all possible segmentations in polynomial time. The algorithm to optimize ACA w.r.t. G and s is summarized in DPSearch (see Algorithm 1). This algorithm only requires two parameters: length constraint n max and the number of clusters k. The computational cost in time is mainly determined by

the deepest nest of iterations.

V. ANALYSIS AND DISCUSSION

Aerial video has the property of changing background and small objects, moving object detection is still an open problem that needs to be addressed further. As the camera is moving, it is not easy to build a reliable background. In addition, the computing resource available on a UAV platform is often limited, so the optical flow is not a suitable choice. Thus, most of the object detection methods are based on frame difference.

Motion Analysis Techniques

Advantages Disadvantages

Integer-Pel Search for Motion Estimation

Video sequences can often be encoded effectively and efficiently using integer-pel search.

Hard for integer-pel search to reduce the final rate-distortion cost.

Digital Restoration

1. Finds the global minimum of the defined energy function for a two-label problem with reasonable computational efficiency. 2. The amount of computation required to estimate histogram is reduce.

The problem of partial tears, whereby the tear does not span the frame. Temporal Clustering Hierarchical aligned cluster analysis (HACA) 1.The temporal clustering problem is posed as an Energy minimization. 2. HACA provides a natural embedding for clustering and visualizing time series data.

1.The computational complexity of is high which limits its applicability to long Sequences.

2. The success is partially depends on the choice of the kernel parameters and the functional form of the kernel

Motion segmentation

by long term analysis

The sparsely of the point trajectories is advantageous for the computational efficiency.

Trackers cover the image only very sparsely, have limited accuracy, and cannot deal with large motion small, independently moving parts, such as arms and legs. Spatiotemporal Oriented Energy Features to Region Tracking Image gradient remain more consistent throughout illumination changes in comparison to colour or intensity

1.Illumination changes are problematic for the pure intensity features 2. Feature extraction requires additional processing.

TABLE 1: Comparisons between Different Motion Analysis Techniques

Although motion information is very important for moving object detection, there are still several drawbacks:

(1)The detected object may be larger than its real size. (2)There may be holes in detection results.

(3)When an object is moving slowly, its motion is unreliable. Besides, most of the saliency detection methods are based on static images, which focus on application of image classification or recognition, so they are not suitable for moving object detection.

VI. PROPOSED METHODOLOGY

(6)

213

the flow chart of the proposed moving object detection algorithm.

Flowchart 1: Flowchart of the proposed algorithm

Motion Saliency Detection: For airborne video, motion cue are salient and reliable in the global image. In this paper, activity information is used in both the candidate production process and the saliency fusion stage. Considering the effect of instance interruption, the forward motion history image (FMHI), which is calculated from previous frames, is used as the temporal saliency information.

Pixel Saliency Detection: As human visual system is

sensitive to contrast in scenes and an object is compressed in spatial allocation, we propose a modified histogram-based contrast method to define spatial saliency at the pixel stage. Particularly, color contrast is weighted by its spatial distribution. As the segmented motion regions in aerial video are frequently too little to calculate the saliency value, the original motion region is enlarged with a certain factor.

Region Saliency Detection: Because segmented regions are compact and informative, also remove the saliency above segmented regions to provide further information for object detection. Since the moving objects in aerial video are extremely little in the whole image, the region saliency is detected in the local region which is obtained from sequential saliency. The region saliency is defined as the distinctiveness of a patch from other patches in the local region.

OUTCOME POSSIBLE RESULT

The possible results will show that the proposed method can detect moving things in aerial video with high effectiveness and accuracy. Meanwhile, compared with an HMI-based

method, proposed method does not have the effect of time delay.

VII.CONCLUSION

This paper focused on the study of different segmentation techniques i.e Integer-Pel Search Method, Graph Cut Segmentation Framework, Spatiotemporal Oriented Energy Tracking, Variational Method, Temporal Clustering. In this paper, proposed method utilizes spatiotemporal saliency in moving object detection. Temporal and spatial saliency is extracted in a hierarchical manner, and both pixel saliency and region saliency are extracted to give a filled design for spatial distribution. The possible results will show that the proposed method can identify moving objects in airborne video with high efficiency and accuracy. Meanwhile, compared with an HMI-based method, proposed method does not have the effect of time delay.

VIII. FUTURE SCOPE

From Observation, the scope and planned to be studied in future work, the propose algorithm are more suitable for motion analysis of aerial images. The detection algorithms estimate object locations in every frame independently, false alarms are unavoidable. In future study will deal with this by combining tracking information.

References

[1] Ling Li and Tao Luo, “Motion Estimation Without Integer-Pel Search”, IEEE Transactions On Image Processing, VOL. 22, NO. 4, PP. 1340-1353, APRIL 2013.

[2] David Corrigan and Naomi Harte, “Algorithm for the Digital Restoration of Torn Films”, Transactions On Image Processing, VOL. 21, NO. 2, PP. 573-587, FEBRUARY 2012.

[3] Kevin Cannons and Richard Wildes, “The Applicability of Spatiotemporal Oriented Energy Features to Region Tracking”, IEEE Transactions On Pattern Analysis and Machine Intelligence, VOL. 36, NO. 4, PP. 784-796, APRIL 2014.