4.5 Optimal ME for WT-based video coding
4.5.3 Developing the criterion for a special case
Further analytical developments are possible if we refer to a specific(N, 0) LS as the(2, 0). Let us recall the equation for this special case.
hk(p) = x2k+1(p) −1
2[x2k(p+v2k+1→2k(p)) +x2k+2(p+v2k+1→2k+2(p))]
lk(p) = x2k(p)
This mean that, in this case, the trajectory v(0) for the frame 2k and for the first decomposition level is just a couple of vectors
v(0) = {v2k+1→2k, v2k+1→2k+2}
This expression can be generalized to the i-th decomposition level:
v(i) =nv2i−1(2k+1)→2i−12k, v2i−1(2k+1)→2i−1(2k+2)
o
The optimal trajectory v(i)∗ is the one minimizing the high frequency band variance. Since this subband has zero mean, this is equivalent to minimize its energy.
We can refer to the first high frequency subband without losing gen-erality as for the other band it suffices to refer to the suitably subsampled version of the input sequence. For the high frequency subband, we can simplify the notation of the lifting scheme as follows
hk(p) = x2k+1(p) −1
2[x2k(p+F2k+1(p)) +x2k+2(p+B2k+1(p))]
lk(p) = x2k(p)
where Bk = vk→k−1 and Fk = vk→k+1 as usual. The optimal trajectory is given by:
v(0)∗ =arg min
B2k+1,F2k+1E {hk(p)}
Where we have
hk(p) = x2k+1(p) −1
2[x2k(p+F2k+1(p)) +x2k+2(p+B2k+1(p))]
= 1
2[x2k+1(p) −x2k(p+F2k+1(p))+
x2k+1(p) −x2k+2(p+B2k+1(p))]
= 1
2(ǫF+ǫB)
and ǫF [ǫB] is the forward [backward] motion-compensated prediction er-ror:
ǫF =x2k+1(p) −x2k(p+F2k+1(p)) ǫB =x2k+1(p) −x2k+2(p+B2k+1(p))
This means that the optimal trajectory minimizes the energy of the sum of this errors. Further developing, we have to minimize
E [hk(p)] = 1
2E (ǫB+ǫF)
= 1
2E (ǫB) +1
2E (ǫF) +hǫB, ǫFi In conclusion,
B2k∗+1, F2k∗+1=arg min
B2k+1,F2k+1
·1
2E (ǫB) +1
2E (ǫF) +hǫB, ǫFi
¸
(4.8) Equation (4.8) is what we need in order to compare the optimal ME criterion to the usual MSE based criterion. With the usual MSE based cri-terion, we independently minimizeE (ǫB) andE (ǫF), so we probably at-tain a low value of the optimal criterion but not necessarily the minimum, as we do not take into account the mixed term. This term grows larger when the two errors images are more similar. This means that the op-timal backward and forward vector are not independent as they should produce error images as much different as possible, being not enough to barely minimize error images energies. In other words, regions affected by a positive backward error, should have a negative forward error and viceversa.
Chapter 5
Motion Vector Encoding
There are more things in heaven and earth, Horatio, than are dreamt of in your philosophy.
WILLIAMSHEAKSPEARE
Hamlet, prince of Denmark, 1601
This chapter summarizes main results obtained in motion vector field (MVF) coding with JPEG2000-based techniques. Motion information ob-tained by a generic ME algorithm is usually highly redundant, so, in order to obtain an efficient coding, the motion vectors have to be compressed. In the previous chapter we saw several techniques for changing MV entropy directly at the ME stage. Afterward, MVs are supposed to be losslessly en-coded and transmitted to the receiver, and, indeed, lossless methods are the object of the first and largest part of this chapter. However, an alter-native approach is considered in Section 5.6, in which we perform ME without caring for MV rate, that is we look for the most accurate motion information, but then consider lossy MV coding techniques.
Another issue we considered in MV encoding was compatibility with JPEG2000 standard. Since our target is to implement a video encoder with the highest possible compatibility with this standard, we felt that also MVs should be encoded by means of JPEG2000. Of course, as this is a still image coding algorithm, we cannot expect to provide the best possible
performance, but this appears to be compensated for by the compatibility with the standard.
Before describing the techniques for MV encoding, we present a first section where some statistics of MVs are given and analyzed. Then our techniques are described, together with experimental results. We tested proposed techniques in many different configurations, in order to bet-ter understand their potential performance. Therefore, several sequences were considered, and motion estimation precision and block size were changed in order to investigate the influence of these parameters on vector coding.
5.1 Motion vector distribution
We evaluated the distribution of motion vector fields in the sequences
“flowers and garden” (250 frames),“bus” (150 frames), and “foreman” (300 frames). All these sequences have a relevant motion content, but in the first one, motion is pretty regular, in the second one, it is regular but com-plex, as many moving objects appear on the scene, and in the third one, motion is quite chaotic.
Full pixel and half pixel precisions were considered. In figures 5.1–5.6, backward and forward vectors for the first three temporal decomposition levels are shown. The log10 of relative frequency is reported, with null vector frequency situated at image center. We remark that for successive temporal levels the search area increases, since temporally distant frames can involve wide movements. This results in a more spread MV distri-bution at higher temporal levels. Anyway, the distridistri-butions show a good regularity, and are pretty concentrated around the null vector. This comes from regularization techniques, which tend to assign a null vector when estimation is not accurate.
From the analysis of these distributions, we can get information on motion content of the sequences and some hint about how to encode those motion vectors.
Figures 5.1 and 5.2 shows that motion in the “flowers and garden” se-quence is quite regular, with a dominant horizontal motion toward the left direction. This sequence is actually a camera panning on an almost statical background.
From Fig. 5.3 and 5.4 we can deduce that in the “bus” sequence motion
is more complex, even though we have mainly horizontal movements. In-deed, in this sequence we have an horizontal camera panning on a bus and moving cars.
The “foreman” sequence is characterized by a more complex motion, as it appears from Fig. 5.5 and 5.6. Many null vectors are estimated, but the distribution exhibits several secondary peaks, coming from different movements of the objects and of the camera in this sequence.
By analyzing these distributions and by taking a look to the video se-quences themselves, we can conclude that MVFs for these sese-quences are quite spatially correlated, but can present relevant temporal variations.
We note also that the distributions have often high values on the axes, i.e. many vectors with a single null component are estimated. As the ME algorithm we use does not allow vectors pointing outside the frame, the estimation of vectors near the border of the image has often a null value for the component orthogonal to the border, see an example in Fig. 5.7 (left).