Over the past few years, a variety of powerful and sophisticated wavelet based image compression schemes such as Embedded Zerotree Wavelet (EZW) coding scheme, Set Partitioning in Hierarchical Trees (SPIHT), Set Partitioning in Embedded Block (SPECK)[26] and Embedded Block Coding with Optimized Truncation (EBCOT)[27] have been developed. Among the wavelet zerotree based image coding algorithms, SPIHT is the most recognized coding method because of its excellent rate-distortion performance. SPECK is a block based low complexity coding scheme compared to SPIHT because it consists of only two ordered list data structures, whereas SPIHT consists of three ordered auxiliary list data structures. However, performance of SPECK is closer to SPIHT. EBCOT is also block based coding scheme with modest amount of complexity.
SPIHT exploits zero-tree structure, whereas SPECK exploits zero-block structure to achieve inter and intra subband correlations. In zero-tree based algorithms, wavelet coefficients corresponding to same spatial location and orientation are grouped to form a spatial orientation tree. In significance test, a tree with no significant coefficient
with respect to a given threshold is coded as zerotree. On the other hand, zero-block based algorithms divide the transformed coefficients into contiguous blocks and per- form significant test on the individual blocks. Insignificant blocks are coded as zero blocks while significant blocks are recursively partitioned for search of significant co- efficients. The advantage of this method is that it uses adaptive quadtree splitting scheme to zoom into high energy areas in a region to code the blocks with minimum significant maps. Other well-known block based algorithms are embedded zero-block coding (EZBC) by Hsiang and Woods [28] and Subband Hierarchical Block Partition- ing (SBHP) by Chrysafis et al. [29]. SBHP is a form of SPECK incorporated into JPEG 2000 under development. EZBC exploited the dependence among quadtree representations of subbands and sophisticated context based arithmetic coding to im- prove the coding efficiency. Danyali and Mertins [30] proposed fully scalable-SPIHT (FS-SPIHT) suitable for heterogeneous networks where users having different net- work access bandwidth and processing capabilities. Recently, Xie et al. [31] enabled SPECK to have full scalability based on the idea of quality layer formation similar to Post Compression Rate Distortion (PCRD) in JPEG 2000. Being a block based coder, EBCOT which is adopted in JPEG 2000 standard, generates feature rich bit streams with low memory requirements but it is highly complex. This is due to use of multiple coding passes within each bit plane, use of context adaptive arithmetic coding and rate-distortion optimization. Cho and Pearlman [32] addressed the reason for different coding performances between different zerotree coding schemes, which are EZW and SPIHT. Subsequently, Moinuddin et al. [33],[34] proposed list based block-tree coding algorithms which reduces the dynamic memory requirements with excellent low bit rate performance.
Most of the algorithms discussed above require a large amount of memory space and need for memory management as the list nodes are updated on each bit plane pass. To overcome these shortcomings, listless variants of SPIHT (i.e., No list SPIHT (NLS))[35] and SPECK (Listless SPECK (LSK))[36] have been reported in literature. However, the performance NLS and LSK are very closer to SPIHT and SPECK re- spectively. Hence, there is a scope to further improve the performance of LSK and NLS.
Though wavelet based coding algorithms provides substantial improvement in im- age quality at lower rates compared to DCT based coders at a cost of complexity, DCT is still used in many applications such as JPEG[4], MPEG-4 and H.264 [37],[38] be- cause of its compression performance and computational advantages. Recently, DCT based coders with innovative data organization strategies and representation of coeffi- cients have been reported with high compression efficiency [39]-[44]. Embedded image
coder based on DCT by Xiong et al. [39]. They have introduced a wavelet-like tree structure of DCT coefficients and applied embedded zerotree quantizer to the DCT coefficients as in EZW coder, which yielded a better performance than wavelet based EZW. Davis and Chawla [40] have proposed significance tree quantization (STQ) op- timized for a given class of images. Monoro and Dickson [41] have applied sorting algorithm of EZW. Junqiang and Zhuang [42] have applied SLCCA wavelet-based image coder to DCT subbands. Hou et al. [43] have presented an image coder that utilizes set partitions based on quadtree splitting (EQDCT). It provides excellent cod- ing performance with lower complexity. Recently, Song and Cho [44] have reported DCT based embedded coders with compression performance higher than JPEG2000 for texture images.
A new class of transform called Discrete Tchebichef Transform (DTT) which is derived from a discrete class of popular Tchebichef polynomials, is a novel orthonor- mal version of orthogonal transform. It has found applications on image analysis and compression [45]-[49]. Mukundan [45]-[47] proposed orthonormal version of Tchebichef moments and analyzed some of their computational aspects. Mukundan and Hunt [48] have shown that for natural images, DTT and DCT exhibit similar energy compact- ness performance. Lang et al.[49] have made a comparison between 4 × 4 Tchebichef moment transform and DCT. They claim that there is a significant advantage for 4×4 Tchebichef moments in terms of error reconstruction and average length of Huffman codes. A block wise moment computation scheme which avoids numerical instabili- ties to yield a perfect reconstruction has been introduced in the literature [50]. For computation of Tchebichef moments, a number of fast algorithms have been proposed [51]-[53]. The Tchebichef moment compression is meant for smaller computing de- vices owing to its low computational complexity. Ishwar et al. [52] have shown that DTT has lower complexity since it requires the evaluation of only algebraic (only add and shift operations, no multiplications) expressions whereas implementation of DCT requires integer approximation or intermediate scaling like Integer cosine transform (ICT) [37]. Abdelwahab [53] has proposed a fast 2 × 2 pruned DTT algorithm for 4 × 4 DTT. This reduces computational complexity by 26% compared to the algo- rithm in [51] without reducing the image reconstruction accuracy. Several algorithms for pruning the 1-D DCT in [54]-[58] and 2-D DCT in [59]-[62] have been addressed. Therefore, there is a need to develop DTT based fast pruning algorithms with better PSNR performance.
Some important characteristics of DTT can be summarized as follows:
space.
• Absence of numerical approximation terms allows a more accurate representation of image features than others which is not possible using conventional transforms. • DTT is invariant to linear transforms and can be efficiently used for image
reconstruction [45].
• Dynamic range of DTT is comparable to that of DCT [52]. • DTT is robust against channel errors [63].
• DTT polynomials have properties that matches closely with Human visual sys- tems (HVS) [64].
• In video compression, the prediction residuals of motion compensation can con- tain large variations. DTT can help a consistent video quality when inter-frame coding is used [65].
Therefore, there is a need to further analyze the performance of DTT over DCT on a JPEG baseline codec and embedded codec coupled with some novel coefficient arrangement techniques.
The performance of Listless embedded coding algorithms such as NLS and LSK can be improved using some novel techniques. The algorithms can be coupled with wavelet and DCT/DTT based transforms in order to access the performance over other wavelet or DCT based SPIHT coders. For complexity constrained encoding situations where even a fast fixed-complexity DCT algorithm is too complex, one can resort to approximate the computation of DCT at the cost of some degradation in the image quality. These applications could be multimedia, mobile communications, personal digital assistants (PDAs), digital cameras and Internet where a lot of image transmission and processing are required.
Even though a number of algorithms for fast computation of DCT are available in the literature, there has been a lot of interest towards finding out the approximate integer versions of floating point DCT [66]-[71].A family of integer cosine transforms (ICT) using the theory of dyadic symmetry is proposed [66] where it has been shown that the performance of ICTs are close to that of DCT. A novel architecture has been presented [67] for a 2D 8×8 DCT which needs only 24 adders. The architecture allows scalable computation of 2D 8×8 DCT using integer encoding of 1D radix-8 DCT. 8×8 versions of two transformation matrices, one for the coarsest and another for the finest (represented as ˆD1 and ˆD5 respectively) approximation levels of exact DCT have been proposed in [68]. Using these two matrices, a trade off of speedup versus accuracy in
various bit ranges can be achieved. The performance shows 73 % complexity reduction with only 0.2 dB PSNR degradation. A family of 8 ×8 biorthogonal transforms called binDCT which are all approximates of popular 8 × 8 DCT have been proposed in [69]. These binDCT show a coding gain of range 8.77-8.82 dB despite requiring as low as 14 shifts and 31 additions per eight input samples. 8 × 8 binDCT shows finer approximations to exact DCT and are suitable for VLSI implementation. A new kind of transform called signed DCT (SDCT) by applying signum function to DCT is proposed in [70]. However, SDCT and its inverse are not orthogonal and it needs 24 additions for transformation. A 8 × 8 transform matrix is presented in [71] by appropriately inserting 20 zeros into the elements of ˆD1 [68]. A reduction of 25 % in computation is achieved over SDCT and this matrix is orthogonal. Unlike the proposed matrices [72]-[74], the transform order need not be a specific integer or a power of 2.
It requires a number of multipliers to implement a transform kernel using con- ventional approach. Multipliers are the major source of power hungry elements in a hardware device. Here the focus is given on distributed arithmetic (DA) computation which do not require multipliers [75]-[77].Several DA based approaches has been pre- sented in the literature. These approaches uses either look-up table [75] or without look-up table [76],[77] techniques. Therefore, there is a scope to develop integer based novel 8 × 8 orthogonal sparse transform matrix for the considered set of applications.