3.4 Comparison of different solution algorithms
4.1.5 Special consideration of all-zero blocks
The video coding standard H.265/HEVC supports very efficient signaling of coding units (CUs) using motion-compensated prediction where all the transform coefficients are equal to zero. This is done by the usage of one of the two syntax elements cu skip flag or rqt root cbf. Both are flags, i.e. they can only take the value zero or one. The first syntax element, cu skip flag, equal to one specifies that the CU is encoded in SKIP mode, i.e. neither motion vectors nor residual signal transform coefficients are transmitted. The motion parameters are derived from so-called block merging, which means that the motion data of a selected merge candidate are reused for the current CU. The list of merge candidates consists of prediction parameters from spatially neighboring or temporally co-located blocks which have been previously transmitted in the bitstream. The other syntax element, rqt root cbf, is only present when the SKIP mode is not used. If it is encoded as equal to zero, it specifies that all the transform coefficients for the current CU are zero. With the help of these two syntax elements, it is possible to set up to 6144 transform coefficients (4096 luma plus two times 1024 chroma), in the case of a 64×64 CU, to zero at the cost of very little bit rate, since only a single flag has to be transmitted. This is not captured in the multi-frame transform coefficient optimization problem 3.18, where it is assumed that each transform coefficient contributes individually to the resulting total bit rate.
Due to the described highly efficient all-zero handling, changing one single transform coefficient from zero to one in a previously all-zero CU will lead to a higher bit rate increase than incrementing an already non-zero coefficient by one in absolute value.
The increase of the `1-norm, which serves as an approximation of the bit rate in Eq. 3.18, however, will be the same in both cases.
In this section, it is shown how the efficient treatment of all-zero blocks in H.265/
HEVC can also be taken into account for the multi-frame transform coefficient op-timization. Three variants are compared for the Class C (832×480) and Class D (416×240) sequences in Tab. 4.3 using the Bjøntegaard delta bit rate (BD bit rate) as proposed in [Bjø01], where a negative value shows a gain in coding performance. For the multi-frame optimization, N = 4 frames are considered jointly. The first variant, denoted in the table as “off,” refers to the case where no special handling is performed and the transform coefficients as obtained from the optimization are directly used for the HEVC bitstream. Note that this is the same as the column showing the perfor-mance of applying the QP-dependent µ rule in Tab. 4.2. Its coding perforperfor-mance can be viewed as a kind of reference configuration in the following. In the second variant, denoted as “single frame,” two rate distortion (RD) costs are determined for each CU, namely the costs resulting from
• using the transform coefficients from the optimization, and
• setting all the transform coefficients of the CU to zero.
For x ∈ {opt, zero}, corresponding to encoding the optimized transform coefficients or an all-zero CU, the RD cost Jx,singlecan be derived from the Lagrangian multiplier λ, the number of bits Rx and the distortion in the current frame DcurrentCU,x as follows:
Jopt,single = λ · Ropt + DcurrentCU,opt (4.3) Jzero,single = λ · Rzero+ DcurrentCU,zero (4.4) If Jzero,single < Jopt,single, the CU is encoded as all-zero. Note that here, as in the rate distortion optimization of the HM encoder, the actual bit rate and distortion are used, not the approximations from Eq. 3.18. Further note that only the distortion of the current CU is considered, and therefore the impact on subsequent frames is neglected.
While there is still an average gain of 5.6 % BD bit rate over the HM anchor configu-ration, compared to the “off” configuconfigu-ration, there is a significant loss. Since the only difference to the “off” configuration is that in the “single frame” configuration some CUs are set to zero, it can be concluded that by this means transform coefficients which would otherwise benefit subsequent frames are dropped, because the
“invest-ment” to encode those coefficients does not pay off from the perspective of a single frame.
As a consequence, the impact on subsequent frames has also to be considered in the computation of the RD cost for deciding whether a CU is to be set equal to zero. This is done by including an additional distortion term which reflects the distortion that occurs in the subsequent N − 1 frames under consideration. Similarly as above, for x ∈ {opt, zero}, corresponding to encoding the optimized transform coefficients or an all-zero CU, the RD cost Jx,multi can be derived from the Lagrangian multiplier λ, the number of bits Rx, the distortion in the current frame DcurrentCU,x, and the distortion in the subsequent frames DsubsequentF rames,x:
Jopt,multi = λ · Ropt + DcurrentCU,opt + DsubsequentF rames,opt (4.5) Jzero,multi = λ · Rzero+ DcurrentCU,zero+ DsubsequentF rames,zero (4.6)
Based on this, two related RD costs can be derived as
Jˆopt,multi= λ · Ropt + DcurrentCU,opt (4.7)
Jˆzero,single = λ · Rzero+ DcurrentCU,zero+ DsubsequentF rames,zero− DsubsequentF rames,opt
| {z }
∆DsubsequentF rames,zero
(4.8) where it holds that
Jopt,multi < Jzero,multi ⇔ ˆJopt,multi < ˆJzero,multi. (4.9)
Here, ˆJopt,multiin Eq. 4.7 is equal to the RD cost for the single frame case Jopt,single, i.e.
without consideration of subsequent frame dependencies, and ∆DsubsequentF rames,zero
is a distortion difference which gives the amount by which the distortion in the subse-quent frames is increased by encoding the current CU as all-zero instead of using the optimized transform coefficients.
Jˆopt,multi = Jopt,single (4.10)
Jˆzero,single = Jzero,single+ ∆DsubsequentF rames,zero (4.11)
Putting it all together, if the distortion difference ∆DsubsequentF rames,zero is added to the single frame RD cost Jzero,single, the impact on subsequent frames is considered in
BD bit rate [%]
Sequence Fixed prediction params. Sliding window based
BasketballDrill −4.9 −9.2
Class C BQMall −0.8 −3.4
(832×480) PartyScene −4.4 −6.9
RaceHorses −1.7 −5.3
BasketballPass 0.4 −2.9
Class D BlowingBubbles −2.3 −5.9
(416×240) BQSquare −3.9 −6.2
RaceHorses −2.2 −6.8
AVERAGE −2.5 −5.8
Table 4.4: BD bit rate results for N = 3 comparing the performance using fixed and sliding window based prediction parameters.
the decision between using the optimized coefficients or encoding an all-zero CU.
Note that ∆DsubsequentF rames,zero is a value which has to be individually computed for each CU. For that purpose, it is assumed that all other CUs are encoded using the optimized transform coefficients (ceteris paribus assumption), and the resulting distortion for the subsequent frames is determined, once using the optimized coeffi-cients and once using an all-zero block for the current CU. The difference between the latter and the former gives the value of ∆DsubsequentF rames,zero. The results for us-ing this modified RD costs, which incorporate the impact on subsequent frames, are shown in the column denoted as “multi frame” in Tab. 4.3. The last column shows the difference between the “multi frame” and the “off” variant. It can be seen that this leads to a moderate, but consistent gain of about 1 % BD bit rate over directly using the transform coefficients which result from solving the optimization problem in Eq. 3.18.