Video 3 – Drone launch, multiple motion vectors

4.3 Experimental Results for Edge Flow

4.3.3 Video 3 – Drone launch, multiple motion vectors

The video is a high density complex scene of static objects with the occasional small moving object (cars / vans), shown in Figure 4.14. The results of the detections are shown in Table 4.4. The algorithm is capable of discriminating between groups of houses, other landmarks and excluding the general background (the forest in this case). The poor performance by both the optical flow and motion estimation techniques is due to the image warping or the brightness pattern tracking not being able to keep up with the rate of change of the camera perspective, another important limitation of existing methods.

4.3 Experimental Results for Edge Flow 88

Fig. 4.14 Video 3 scene - drone flying with multiple axis of motion

(a) Edge Flow (b) Motion Estimation (c) Optical Flow

Fig. 4.15 A comparison of Edge Flow with Motion Estimation and Optical Flow on the UAV video

Table 4.4 Detection performance for video 3

Algorithm Total Detections TP FP FN Motion Estimation 15 2 13 125

DF TVL1 OF 5 4 1 123

Edge Flow 141 119 22 8

4.3 Experimental Results for Edge Flow 89

Table 4.5 illustrates the processing performance comparisons between each algorithm. The results were obtained by processing each video sequence for 500 frames and recording the minimum, maximum and average frame rate.

Table 4.5 Performance analysis of each algorithm across each test video stream

Algorithm FPS Dual Flow TVL1 OF ME (ARTOD) Edge Flow WISE

640 x 360 Min 0.16 2.84 19.59 24.97 Max 0.17 8.48 82.77 43.85 Avg 0.17 4.58 58.01 32.99 848 x 480 Min 0.14 1.00 7.31 11.03 Max 0.16 5.38 39.44 21.69 Avg 0.15 2.60 24.35 18.69 1920 x 1080 Min 0.04 0.22 0.66 0.80 Max 0.04 1.09 6.25 5.10 Avg 0.04 0.35 1.85 3.81

Using this new approach, the detection of texture patches can be carried out accurately and in real-time. In this work we demonstrate the capabilities of the algorithm on video scenarios, and show that object textures in the scene are reliably detected. We are able to show clearly the capability of the algorithm to be robust in occlusion scenarios; working in real-time, and defining clear objects where other techniques attribute such small detections to noise. The method set out in this work is novel in its approach to addressing / approaching the moving camera problem in detectingallobjects in a scene. All existing techniques assume that foreground objects of interest must be moving or changing in some way and can only detect such objects. This method enables both moving and static (unchanging) objects to be detected. This is a significant step forward, paving the way for detections of small minor objects as well as the large moving parts of a scene. Also, the method does not make any prior assumptions about the scene, and is wholly data driven. The latter statement is critical; what other techniques dismiss as noise or unimportant, this technique extracts and highlights it as an object texture. This enables retention of information which would otherwise be lost at the detection stage, which can be filtered and analysed as required. Key objects or people can easily disappear into the background if the detection algorithm dismisses small or “noise-like” novelties early on. This can later be filtered out based on the object parameters

4.3 Experimental Results for Edge Flow 90

The direction and relative speed can be associated with the edge gradients – a sharper gradient indicates a higher relative speed, with the sharper gradient being the leading edge of the object texture. This form of clustering is robust and combined with the first two components is resistant to occlusion. Should an texture patch be occluded partially by another texture patch, they will remain separate clusters unless the object is completely occluded. Further, once the occluding object has moved on, the cluster will return to being a separate texture patch.

Currently, the approach is parametric, requiring a magnitude range to be defined. This magnitude defines the similarity of candidate pixel gradients required to be linked together as an edge. In principle it is possible to autonomously define a gradient magnitude range but this will be left for the future. Through this method each similar and proximate edge are clustered together, resulting in a contiguous object being defined for each different texture (object with edges); an object is defined as an area of similar texture, not as an isolated object per se. For example, a car may be defined as 3 separate texture patches in edge flow – the bonnet which is of a particular texture, the roof which is a different texture, and the boot which is the same texture as the bonnet but separated by the roof. The main innovations of this approach are; A motion vector can be extracted from each texture patch within a scene in real-time. Objects which are moving in different directions but are spatially proximal can be clearly separated despite any occlusion in the scene. The motion vector is a representation of the relative velocity of an object compared to the camera platform; later, given the platform velocity, this can be used to determine the absolute velocity of all the objects within a scene. With the inclusion of optical flow in the method, the average processing time remains around 20 frames per second (50ms per frame) for a 640x480 video stream. As with the clustering technique the processing time changes slightly dependent on how many objects are detected. The following advantages are introduced by Edge Flow:

1. It works well with partially occluded texture patches and keeps them separate until completely occluded,

2. It rediscovers the texture patches post-occlusion,

3. Static and moving texture patches are clearly separable, and

4. The processing speed combined with the first two components of Edge Flow remains real time (between 25 – 40 fps depending on the number of texture patches discovered in a scene).

This is significantly faster than other methods, and still permits some head room for additional processing. An example of the occlusion discrimination capabilities can be seen in figure

In document Autonomous real time object detection and identification (Page 104-108)