In [7], Chan, Yu, and Constantinides describe several coding schemes using quadtrees in which the split/merge operations used to construct the tree are controlled by prediction error criteria. We propose to use a quadtree to encode motion vectors for a block-matching motion-compensated video coder in which the tree is constructed using the bit-minimization principle. The basic coder design is similar to the p64
coder shown in Figure 2.18. The dierence is that now motion vectors are coded with a quadtree whose leaves represent regions of uniform motion. Conceptually, one might associate a motion vector with each node of the quadtree, which, for internal nodes, is rened further down the tree. Using this viewpoint, for each node other than the root, the dierence between the node's and its parent's motion vectors is transmitted. Thus, one can construct the motion vector for each leaf by adding the root's motion vector to the sum of the dierences encountered along the path from the root to the given leaf.
We propose to combine the quadtree encoding of motion vectors with the basic motion-compensated video coder specied in thep64 standard. Conceptually, the
main dierence between this coder and the p64 coder is in the encoding of the
motion vectors. Each leaf in the quadtree encodes a motion vector for a number of 88 blocks. Therefore, the quadtree decomposition ends at the 88 level.
Given a motion eld and its quadtree decomposition, we code the structure of the tree using an adaptive arithmetic code to indicate whether a node is a leaf or not (a dierent adaptive coder is used for each level). The motion vector dierences at each
node are coded using another adaptive arithmetic code (again, using a dierent coder for each level). For each leaf node, the 88 transform coded blocks subsumed by the
node are transmitted in scan order. The decision of how to code the block (choosing from among alternatives similar to those in the p64 standard) is also transmitted
using an adaptive arithmetic coder. If the quantized transform coecients are trans- mitted, this is done using the run-length/Human coding method from the p64
standard. The counts for the adaptive arithmetic coder are updated once after each frame is coded.
To perform motion estimation, we adopt the bit-minimization strategy elaborated in Chapter 3. The quadtree coding structure described in Section 4.1.2 has several nice properties that make a dynamic programming solution possible for nding an optimal set of motion vectors that minimizes the sum of the code-lengths needed to encode the motion vectors and the transform-coded prediction error. Since the arithmetic code used for the motion vector dierences at each node doesn't change during the coding of a particular frame, the optimal number of bits to code the motion vector dierences for any subtree is independent of the coding of any other disjoint subtree. Similarly, the transform coding of the prediction errors is independent for disjoint subtrees.
We now describe a dynamic programming algorithm for choosing an optimal quadtree. For each node in the tree, we store a table indexed by the (absolute) motion vector of the node's parent. For each possible motion vector ~v of the parent, this table gives the minimum code-length to code the subtree rooted at the current node given that the parent's motion vector is ~v. Also stored with each table entry is a motion vector giving the minimum code-length. Construction of the tables is performed in a bottom-up fashion, starting at the 88 block level. For a node p,
the table is constructed by nding, for each choice of motion vector ~v0 for the parent
node, a motion vector for p that results in the minimum code-length for the subtree rooted at p. Ifp is at the 88 block level, this is done by computing the transform
code-length of the prediction error for each motion vector in the search range S and noting the minimum code-length and the corresponding motion vector. Otherwise we consider for each motion vector ~v inS the code-length needed to transform-code the prediction errors if the quadtree is pruned at p. (This quantity can be computed in a preprocessing step.) We also consider the code-length if the quadtree is not pruned at p. This code-length is computed by indexing the tables of children of p with ~v
and summing. The minimum of these two quantities is added to the number of bits to code~v0
,~v. The result is the minimum code-length required to code the subtree
rooted at pgiven motion~v0 atp's parent node.
Once the minimum code-length is computed for the root of the quadtree, the motion vectors for each node in the tree are determined by going back down the tree, using the tables constructed on the way up. The optimal motion vector for the root node is made known to its children. Each child uses this to index its table to nd its optimal motion vector. Pruning of the tree is also performed as a result.
0 20 40 60 80 100 120 140 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 MSE
Rate (bits per pixel per frame) PVRG
M1 M2 QT
Figure 4.4: MSE vs. Rate for Trevor
The dynamic programming algorithm requires O(NjSj2) time, where N is the
number of 88 blocks in the frame and S is the search region for block-matching.
The space requirement is O(NjSj).
4.3 Experimental Results
The quadtree coder is implemented using routines from the PVRG coder for motion compensation, transform coding, quantization, and run-length/Human coding of the quantized transform coecients. The quadtree structure and motion vector informa- tion are not actually coded; however, adaptive probability models are maintained and used to compute the number of bits that an ideal arithmetic coder would produce. Our objective is to explore the limits in performance improvement by aggressively optimizing the coding of motion vectors.
We performed coding simulations using 50 frames of the grayscale 256256
\Trevor" sequence. The p64 coders were modied to accept input at 256256
resolution. An operational rate-distortion plot for the quadtree and p64 coders
is given in Figure 4.4. The quadtree coder gives better rate-distortion performance for rates under about 0.25 bits/pixel. Furthermore, the range of achievable rates is extended to well below the 0.05 bits/pixel achievable with M2, albeit with higher distortion.