7.5.1 Predictive Displacement Estimation
Currently, the region for template matching is limited to 16x16 pixels around the
averaged amplitude and direction of MVs of the identified region which may contain
relatively fast moving objects. Since the target object is moving, the average amplitude
and direction of the MVs should be compensated by the predicted speed change (or
acceleration) of the object. The change in speed of the detected object can be predicted
by the amplitude of MVs of previous frames. With a basic kinematic model for the
movement of the moving object, the prediction can be refined by the use of Kalman
filter.
7.5.2 The Use of Bi-Prediction Motion Vector
In the current implementation, the MVs from P-frames were used to evaluate the
movements of objects in successive images. As discussed in Chapter 2, MVs from
H.264/AVC can be erroneous as they were used primarily for video compression rather
than precision object motion estimation. One of the very important pieces of
information in B-frames is bi-prediction motion vectors of some macroblocks. The
bi-prediction MVs can be used to refine the accuracy of MVs in P-frames. By taking
into account the spatial and temporal consistency of object movements in successive
frames, MV outliers can be detected and eliminated. In this connection, the false
positive rate can further be reduced.
One point to note is that MVs are estimated using the current frame as the coordinate
reference. The coordinates in the current frame are always in integer form, aligning to
the macroblock boundaries. For a bi-predictive macroblock in a B-frame, each pixel in
173
to the two bi-predictive MVs. The bi-predictive MVs can be of sub-pixel units of up to
1/4 precision rather than integer. Also, they are not limited to be aligned to the
macroblock boundaries.
Therefore, to utilise the bi-predictive MVs in B-frames to improve the reliability of
MVs in P-frames, the coordinate reference of the MVs in the B-frame has to be
changed to use the current P-frame as reference. This involves the construction of a
MV map by rounding or truncating the sub-pixel MVs to integer pixels. After that, the
MV map can be used to compare with the MVs obtained from the current P-frame.
MVs in the current P-frame can be discarded if the discrepancies are larger than certain
thresholds, indicating they are temporally inconsistent to the corresponding MVs found
in the previous B-frame.
7.5.3 The Algorithm for Template Matching
Currently, the search algorithm for template matching is performed by simple
exhaustive spiral search about the estimated displacement from the position in the
previous frame. The search can be computationally inefficient especially when a match
template cannot be found inside the defined search range. The speed of the search
algorithm can be improved by employing smarter search algorithm. Similar fast search
algorithms for H.264/AVC motion estimation can be considered, such as the Diamond
Search algorithm (Tham and S. et al., 1998), UMHexagonS algorithm (Chen and Xu et
al., 2006), and Image Edge Assisted Search algorithm (Chan and Siu, 2001). According
to the test results of these authors, all these fast search algorithms can reduce the
computation cost by at least 80%.
Also, the decision on successful template matching relies on the evaluation of sum of
174
cost for SAD evaluation can be reduced by using integral image (Schweitzer and Bell
et al., 2002, Viola and Jones, 2001). The properties of Integral Image allow rapid and
efficient computation. Figure 7-4 illustrates how the Integral Image can facilitate rapid
computation of sums and differences of rectangular regions. Figure 7-4(a) shows a
rectangle at location from Po(0,0) to P(x,y). The sum of pixel values above and on the
left of P(x,y) is Px,y as expressed in Equation (7.5).
, ' , ' ( ', ') x y x x y y P i x y ≤ ≤ =
∑
(7.5)where i x y
(
', ')
is the pixel value at point ( ', ')x y . The relationship of area A, B, Cand D to the point P1, P2, P3 and P4 as shown in the Integral image in Figure 7-4(b)
can be expressed as Equation (7.6).
1 , 2 , 3 , 4
P =A P = +A B P = +A C P = + + +A B C D (7.6)
By solving these equations, D can be expressed as Equation (7.7).
1 4 2 3
D=P+P −P −P (7.7)
So, the area D can be evaluated very quickly by knowing the sum of pixel values at the
four corners of D.
Figure 7-4: Integral image. (a) sum of pixel values above and left of P(x,y), (b) sum of pixel values above and left of P4(x,y) = A+B+C+D
175
In addition to the use of search algorithm, the moving direction of the identified
template can also be estimated from the MVs from the H.264/AVC encoder. During
the HV mode, the template identified from HG mode can be evaluated for the moving
direction of blocks inside the template by referring to the MVs from the H.264/AVC
encoder. Since the MVs from H.264/AVC encoder are referring to the movement from
the current frame to the previous frame, a reverse MV map needs to be constructed so
that the movement of the blocks in the template from the previous frame to the current
frame can be evaluated. After that, if the resultant movement of the whole template is
consistent to the estimated movement, the HV can be regarded as successful. Similarly,
tracking can perform in a similar way as in the HV mode.
7.5.4 Reducing False-Positive Detection
As mentioned in Chapter 4.4.4, there were cases of false-positive HG and HV for MV
based moving object detection. The false-positive hypothesis can pass the HV stage
because the identified displacement in the HV stage is within the allowable limit of the
estimated displacement evaluated in the HG stage, even though the estimated
displacement found in HG is erroneous.
The false positive detection can be reduced by evaluating the direction of movement of
the template in HV stage. If the movement of the template from the HG frame to the
HV frame is moving away from the ego vehicle, the HV can be treated as failed. And if
the direction of movement of the template is pointing to the FOE, and the amplitude of
the displacement is the same as the displacement due to ego motion, the HV can also
176