Optical Flow versus H.264/AVC Motion Vector

2.3 Generic Moving Object Detection

2.3.2 Optical Flow versus H.264/AVC Motion Vector

Optical flow is one of the essential algorithms for moving object detection. It estimates

the velocity of movement of a brightness pattern in an image pair.

The earliest methods for optical flow evaluation were originally developed by Horn and

Schunck (1981) and Lucas and Kanade (1981). The basic assumptions for optical flow

are brightness constancy, spatial coherence and temporal persistence. These

assumptions result in a set of equations relating the intensity gradients of pixels in

successive frames. The resulting optical flow field can be solved by minimising the

cost function involved. Optical flow is pixel based computation. It is computationally

expensive to embedded systems. Studies show that one Digital Signal Processor is

Motion vectors (MVs) in H.264/AVC video encoding are determined by minimising

the cost function (J ) that essentially consists of a distortion term (D) and a rate term

(R), as shown in Equation (2.1). J is known as the Rate Distortion Optimisation (RDO)

cost. The distortion term (D) is the matching function that is usually evaluated by the

Sum of Absolute Difference (SAD) with formula shown in Equation (2.2), where s is

the signal from the original video, c is the signal from the coded video, Bx x By is the

block size for the evaluation, m=

(

m mx, y

)

Tis the motion vector (MV).

J = +D λR (2.1) 1 1 ( , ( )) ( , ) ( , ) By Bx x y x y SAD s c m s x y c x m y m = = =

∑∑

− − − (2.2)

Each MV in a H.264/AVC video encoder represents an image block of variable size of

either 16x16, 16x8, 8x16, 8x8, 8x4, 4x8, or 4x4, depending on the decision of the

motion estimation algorithm (Chiu and Siu, 2010). Motion vectors are generated when

a motion estimation algorithm is run during video encoding. The motion vectors

represent the displacement of blocks between successive frames. The goal of motion

estimation in H.264 video compression is to achieve high quality video with the lowest

possible bit rate by correlating the patterns in the past video frames to the current video

frame. So, if the motion vectors directly available from motion estimation for video

compression are used for moving object detection, there will be many outliers that need

to be eliminated before accurate moving object detection can be performed.

Motion estimation is highly computationally expensive (Chan and Siu, 2001). To

reduce the computational cost for finding the best match, there have been studies on

many fast search algorithms. There is a set of reference software publicly available

from Heinrich Hertz institute (HHI 2012) that is commonly used for educational

purposes and for benchmarking among different implementation approaches of

minimum sum of absolute difference (SAD) value inside the search window, three fast

search algorithms, namely Uneven Multi-Hexagon Search (UMHexgonS) (Chen et al.

2002), Simplified Hexagon Search (SHS) (Yi et al. 2005) and Enhanced Predictive

Zonal Search (EPZS) (Tourapis and Tourapis 2003) are included in the reference

software. These fast search algorithms mainly comprise of three steps, namely the

initial predictor selection, adaptive early termination, and prediction refinement.

The initial predictor selection stage selects a MV predictor among a set of predictors

that are potentially giving good estimation results. Instead of examining all possible

positions in a search window to determine the best predictor, these fast search

algorithms only examine a smaller set of positions according to some temporal and / or

spatial constraints.

In the adaptive early termination stage, the MV search is terminated by examining the

distortion evaluated by SAD. If it is smaller than a threshold determined by minimum

distortion values of previously examined blocks, MV search can be terminated.

In the prediction refinement stage, the MV is refined by searching for the best predictor

with a search pattern around the best predictor. The search pattern is designed to reduce

the chance of being trapped in a local minimum, and to reduce the number of required

search for computation efficiency.

With a fixed video frame rate of small frame-to-frame interval, optical flow can also be

estimated by the block matching approach, such as that used in the motion estimation

process in the H.264/AVC encoder (Davis and Karul et al., 1995, Chi and Tran et al.,

2007). The optical flow can simply be estimated by the motion vector divided by the

The equivalence of motion estimation by the block matching and optical flow methods

implies that the MVs for video coding can be used for moving object detection and

vice versa. However, the ultimate goal of the use of MVs in the video encoder is to

achieve the best coding efficiency possible. The resultant MVs do not guarantee to

represent the true motion of objects in the scene. They are therefore noisy for moving

object detection. It is a challenging task to make use of such noisy motion information

for moving object detection. Also, the video coding must be executed in real time for

encoding live video without frame loss or reduced frame rate. This implies that the

moving object detection algorithm for use with MVs from H.264/AVC video encoding

must be highly efficient to allow completion within the duration between successive

frames.

In document Moving object detection for automobiles by the shared use of H 264/AVC motion vectors : innovation report (Page 37-40)