2.3 Generic Moving Object Detection
2.3.2 Optical Flow versus H.264/AVC Motion Vector
Optical flow is one of the essential algorithms for moving object detection. It estimates
the velocity of movement of a brightness pattern in an image pair.
The earliest methods for optical flow evaluation were originally developed by Horn and
Schunck (1981) and Lucas and Kanade (1981). The basic assumptions for optical flow
are brightness constancy, spatial coherence and temporal persistence. These
assumptions result in a set of equations relating the intensity gradients of pixels in
successive frames. The resulting optical flow field can be solved by minimising the
cost function involved. Optical flow is pixel based computation. It is computationally
expensive to embedded systems. Studies show that one Digital Signal Processor is
21
Motion vectors (MVs) in H.264/AVC video encoding are determined by minimising
the cost function (J ) that essentially consists of a distortion term (D) and a rate term
(R), as shown in Equation (2.1). J is known as the Rate Distortion Optimisation (RDO)
cost. The distortion term (D) is the matching function that is usually evaluated by the
Sum of Absolute Difference (SAD) with formula shown in Equation (2.2), where s is
the signal from the original video, c is the signal from the coded video, Bx x By is the
block size for the evaluation, m=
(
m mx, y)
Tis the motion vector (MV).J = +D λR (2.1) 1 1 ( , ( )) ( , ) ( , ) By Bx x y x y SAD s c m s x y c x m y m = = =
∑∑
− − − (2.2)Each MV in a H.264/AVC video encoder represents an image block of variable size of
either 16x16, 16x8, 8x16, 8x8, 8x4, 4x8, or 4x4, depending on the decision of the
motion estimation algorithm (Chiu and Siu, 2010). Motion vectors are generated when
a motion estimation algorithm is run during video encoding. The motion vectors
represent the displacement of blocks between successive frames. The goal of motion
estimation in H.264 video compression is to achieve high quality video with the lowest
possible bit rate by correlating the patterns in the past video frames to the current video
frame. So, if the motion vectors directly available from motion estimation for video
compression are used for moving object detection, there will be many outliers that need
to be eliminated before accurate moving object detection can be performed.
Motion estimation is highly computationally expensive (Chan and Siu, 2001). To
reduce the computational cost for finding the best match, there have been studies on
many fast search algorithms. There is a set of reference software publicly available
from Heinrich Hertz institute (HHI 2012) that is commonly used for educational
purposes and for benchmarking among different implementation approaches of
22
minimum sum of absolute difference (SAD) value inside the search window, three fast
search algorithms, namely Uneven Multi-Hexagon Search (UMHexgonS) (Chen et al.
2002), Simplified Hexagon Search (SHS) (Yi et al. 2005) and Enhanced Predictive
Zonal Search (EPZS) (Tourapis and Tourapis 2003) are included in the reference
software. These fast search algorithms mainly comprise of three steps, namely the
initial predictor selection, adaptive early termination, and prediction refinement.
The initial predictor selection stage selects a MV predictor among a set of predictors
that are potentially giving good estimation results. Instead of examining all possible
positions in a search window to determine the best predictor, these fast search
algorithms only examine a smaller set of positions according to some temporal and / or
spatial constraints.
In the adaptive early termination stage, the MV search is terminated by examining the
distortion evaluated by SAD. If it is smaller than a threshold determined by minimum
distortion values of previously examined blocks, MV search can be terminated.
In the prediction refinement stage, the MV is refined by searching for the best predictor
with a search pattern around the best predictor. The search pattern is designed to reduce
the chance of being trapped in a local minimum, and to reduce the number of required
search for computation efficiency.
With a fixed video frame rate of small frame-to-frame interval, optical flow can also be
estimated by the block matching approach, such as that used in the motion estimation
process in the H.264/AVC encoder (Davis and Karul et al., 1995, Chi and Tran et al.,
2007). The optical flow can simply be estimated by the motion vector divided by the
23
The equivalence of motion estimation by the block matching and optical flow methods
implies that the MVs for video coding can be used for moving object detection and
vice versa. However, the ultimate goal of the use of MVs in the video encoder is to
achieve the best coding efficiency possible. The resultant MVs do not guarantee to
represent the true motion of objects in the scene. They are therefore noisy for moving
object detection. It is a challenging task to make use of such noisy motion information
for moving object detection. Also, the video coding must be executed in real time for
encoding live video without frame loss or reduced frame rate. This implies that the
moving object detection algorithm for use with MVs from H.264/AVC video encoding
must be highly efficient to allow completion within the duration between successive
frames.