Dept of Electrical and Computer Engineering, Singapore
2. An Overview of Mesh Coding Algorithms
2.1.
Static Mesh Compression
Over the past decade, much research has been focused on compression of 3D static meshes and a majority of these [7, 8, 10, 11, 26, 34, 40, 42] address efficient encoding of mesh connectivity. Using spiralling tree-based encoding schemes, popular algorithms like Cut-
(a)
(b)
Figure 1. Examples of animations synthesized by (a) changing mesh geometry as seen from frames 95 and 105 of the Chicken animation or (b) changing both mesh geometry and
connectivity (around the mouth) for realistic facial expression generation.
(a) (b)
Figure 2. (a) Image segmentation into equal-sized blocks is a simple pre-processing step to motion prediction in video coding. A segmented frame from the Foreman sequence. (b) Segmentation of the Dino 3D mesh (2039 vertices, 3999 triangles) into 29 pieces using spectral mesh decomposition [28]. Since 3D meshes are non-planar, segmenting 3D meshes into coherent pieces is a non-trivial and compute-intensive task.
to 1.5-2 bits per mesh triangle. A few multiresolution geometry-cum-connectivity repre- sentation techniques [17, 32] have been proposed to facilitate progressive data transmission and achieve a compression efficiency of around 4-10 bits per vertex (bpv). Compression of mesh geometry is only a supplementary to the connectivity coding scheme in these algo- rithms.
Geometry compression, which involves coding of the floating point (x,y,z) vertex coor- dinates, is inherently lossy and has been attempted using predictive coding as well as signal processing-based techniques. Predictive coding [42] exploits correlation in the mesh data by predicting a vertex position using the positions of its neighboring vertices. Prediction errors are quantized and entropy coded for compact representation. A typical compression efficiency of 7-12 bpv is obtained using this scheme. Spectral compression [48] is a popular signal processing-based mesh compression method, where the mesh geometry is projected onto an orthonormal basis and reconstructed using a small number of components in the basis. This method achieves a compression efficiency of around 14 bpv for perceptually lossless encoding. A wavelet-based geometry compression technique [2] achieves a com- pression efficiency of 8 bpv. Since the number of components for geometry reconstruction can be adaptively varied, [48] and [2] can also be used for multiresolution mesh represen- tation. Other techniques that tackle the problem of mesh representation with various levels of detail are [32, 8].
2.2.
Dynamic Mesh Coding Algorithms
As noted above, there are two types of 3D animation sequences - (i) dynamic geometry sequences where mesh motion is achieved by moving the mesh vertices with time, and (ii) dynamic geometry-cum-connectivity sequences where mesh motion is accompanied by changes in mesh geometry as well as connectivity. Dynamic geometry compression algo- rithms can be grouped into three major classes based on their implementation- Registration- based, Prediction-based and PCA-based multiresolution representation. Examples of these algorithms are discussed below.
2.2.1. Registration-Based Compression
In Lengyel’s pioneering work on registration-based dynamic geometry compression [25], he proposes the segmentation of the mesh into smaller sub-meshes and represents the motion of each of these sub-meshes using rigid-body affine transforms. His compression mecha- nism yields an efficiency of 3.45 bpvf (bpv per frame) for the Chicken animation with 16 and 4 bits used for affine and vertex quantization respectively.Ibarria et al. report a com- pression efficiency ranging from 1.37 to 2.91 bpvf for their Dynapack algorithm [18] when the quantization ranges from 7 to 13 bits for test animations. Their algorithm exploits space-
time coherence in dynamic geometry by predicting the position of each vertexv in frame
f from three of its neighbors in f and the positions of v and its neighbors in the previous
frame.
A video coding-like method which segments the 3D mesh into blocks and computes motion vectors and error residuals for each mesh block is proposed by Ahn et al. [1]. A compression efficiency of 9.6 bpvf is obtained using the encoding scheme that consists of I
(Intra), P (Predicted) and B (Bi-directionally predicted) meshes for the Chicken animation. In Gupta et al.’s dynamic geometry compression scheme [13], the mesh is partitioned into segments, and the displacement of vertices in each segment is computed using Iterative Closest Point (ICP)-based registration. The encoding scheme describes mesh motion using a few affine parameters and residual errors to achieve a compression efficiency of 2.5 bpvf for the Chicken animation.
2.2.2. Prediction-Based Compression
Another interesting work on dynamic geometry compression is that of Yang et al. [46], based on vertex-wise motion vector (MV) prediction. Each vertex is given a motion vec- tor obtained from the neighborhood of the vertex, defined as the set of all vertices within a threshold distance around the vertex. This their coding procedure requires a third of the bitrate compared to [25] for the same quality of animation reconstruction measured in terms of Signal-to-Noise ratio (SNR). Stefanoski et al. propose a connectivity-based prediction technique in [38], where prediction is performed in a frame-to-frame fashion using the pre- vious frame and the partly decoded current frame. Mesh connectivity is used to determine the order of vertex compression and the spatial-cum-temporal dependency between vertex locations is exploited using a non-linear spatio-temporal predictor with angle preserving properties. They report a 25% improvement in compression performance over competing prediction schemes like [18], especially for high quality animation reconstruction. Muller
et al. [31] propose another prediction-based compression algorithm using Differential Pulse
Code Modulation (DPCM) where errors in prediction from the previously decoded mesh are clustered in an octree. Only a representative from each cluster is used for further processing which results in a significant reduction in bit-rate.
2.2.3. Multiresolution Representation
Recently, multi-resolution mesh representation for bandwidth limited streaming applica- tions has generated much interest. A notable work on multiresolution representation of dynamic geometry is that of Alexa et al. [3] who propose a Principal Component Anal- ysis (PCA)-based compact animation representation scheme where each mesh in the an-
imation sequence is projected on a basis ofn PCA eigenvectors. The animation may be
reconstructed usingk eigenvectors where k << n. Higher the k, greater the level of de-
tail. Another example of wavelet-based multiresolution encoding is that of Guskov et al. [14], which exploits parametric coherence in mesh sequences using an anisotropic wavelet transform and progressively encodes wavelet details. Payan et al. [33] propose another wavelet-based multiresolution representation scheme based on a temporal lifting scheme that exploits the temporal redundancy in dynamic geometry.
In [21], Karni et al. propose a compression scheme that employs a combination of Prin- cipal Component Analysis (PCA) and Linear Predictive Coding (LPC). Recently, localized PCA-based dynamic geometry coding techniques have yielded good compression perfor- mance. Sattler et al. [35] propose animation compression using Clustered PCA (CPCA) where the mesh is first segmented into meaningful components based on vertex motion anal- ysis and PCA is applied on each of these components. This compression scheme outper- forms both pure PCA-based and PCA+LPC approaches while achieving better animation
reconstruction. Another Localized PCA Analysis (LPCA)-based compression scheme is proposed by Amjoun et al. [4]. Upon clustering the mesh using local similarity properties, a local coordinate system is defined for each cluster with respect to which the cluster motion is encoded using PCA. LPCA coding achieves better compression performance compared to CPCA-based compression for similar quality of reconstructed animation.
2.2.4. Other Coding Algorithms
Varakliotis et al. propose animation encoding with RTP packetization in [43], and recom- mend insertion of I frames in the encoded/transmitted mesh sequence to maintain animation smoothness. A Differential Pulse Code Modulation (DPCM)-based encoder is used to com- press the animation, whose compression efficiency is low. Main contributions of this work include (i) Analysis of the trade-off between compression performance and reconstructed animation quality and (ii) Introduction of the Peak Mean Square Error (PMSE)-based dis- tortion metric to tackle degradation of animation smoothness under noise.
MPEG-4 Part 25 [19] presents generic tools for dynamic 3D mesh compression using Bone-Based Animation (BBA), which involves decomposition of geometric motions in the animation to elementary transformations, and Frame-based Animation Mesh Compression (FAMC), where the animation is divided into segments that can be decoded independently. A spatially and temporally scalable compression scheme for 3D animations using FAMC, where the original animation is reconstructed at multiple layers corresponding to different spatial resolutions is proposed in [39]. Boulfani et al. [5] propose a 3D dynamic mesh compression scheme where geometry compensation is performed upon clustering the mesh using motion characteristics followed by application of the scan-based wavelet transform.
2.2.5. Encoding 3D Dynamic Meshes with Changing Connectivity
Very few works deal with compression of animations with changing connectivity. Shamir
et al. [36] suggest a multi-resolution representation scheme for animations with changing
connectivity using the T-DAG data structure, which can be incrementally constructed even as the input mesh is processed. Gupta et al. [12] propose an Iterative Closest Point (ICP) registration-based geometry-cum-connectivity coding scheme for dynamic 3D MMs. As in [13], the current and previous frames are partitioned to generate sub-meshes and inter- mesh correspondences are computed using ICP to identify the added/deleted vertices over time. Subsequently, the errors in geometry and connectivity prediction are encoded and transmitted.