Chapter 7: Conclusions and Future Work
7.1 Conclusions
In this thesis, we have addressed the problems of image and video compression algorithms and their related transforms such as power consumption, hardware cost, computation time and output accuracy. Hence, new architectures for image compression algorithms and the related data transforms that considered these issues have been introduced. The original contributions towards the accomplishment of the research objectives as outlined in chapter 2 have been detailed in chapters 3-6, and are summarised below:
1. Two new, low computational complexity non-transform-based algorithms for low bit rate image compression and their architectures have been suggested in chapter 3. The proposed algorithms and architectures are parameterised in terms of the number of quantisation levels, input block size and pipelining stages, offering different output precision levels and processing speeds. The performance evaluation of the new algorithms and their architectures has shown that they are suitable for low power consumption and high speed small devices. The analysis has also revealed that the proposed architectures can operate at a speed of up to 312 MHz. Furthermore, their power consumption is circa 8 mW at an operating frequency of 50 MHz and 4×4- pixel block size.
2. Efficient architectures for multidimensional transforms, such as the DCT and DWT have been suggested in chapters 4-6. The proposed DCT architectures are based on the 1-D Radix-2 DCT and 3-D DCT VR algorithms due to their low computation load. While, the proposed DWT architectures are based on a lifting scheme for CDF 9/7 computation.
3. In chapter 4, two new high speed architectures for multidimensional DCT have been proposed; the first is based on the 1-D DCT Radix-2 algorithm using the RCF computation approach, while the second architecture is based on the 3-D DCT VR algorithm. The results have revealed that a very high computation speed of up to 305 MHz can be achieved. At such high speeds, the 3-D DCT computation times of 512×512×8-point is less than 6.8 ms. In addition; the power consumption using a
wordlength of 21-bit at 10 ns clock period is as low as 98 mW. Furthermore, an infinite PSNR between the original and the reconstructed data using a wordlength of 21-bit can be achieved. The comparisons with similar architectures have revealed that, both outperform existing architectures in terms of power consumption, speed and hardware usage.
4. In chapter 5, two new low hardware usage architectures based on the 3-D DCT VR algorithm have been suggested. The proposed architectures avoid the need for the memory for data transposition in the butterfly and post addition stages, which in turn reduce the hardware usage and improve the processing speed. The proposed architectures are parameterisable in terms of wordlength which provide different output precision levels, power consumption, hardware usage and processing speeds. The proposed architectures have been tested using different images and video sequences, different wordlengths and clock frequencies. The results have revealed that the number of occupied slices is 722 and 1235 for the first and second architecture, respectively. Furthermore, the maximum operating frequencies achieved by the two architectures are 250 and 330 MHz using a 14-bit output wordlength and 8×8×8-pixel input cube size. Furthermore, an infinite PSNR between the original and reconstructed frames using 20-bit wordlength has been attained. Moreover, significant hardware usage reduction with higher operating frequencies is achieved when compared with similar 3-D DCT architectures.
5. New parallel multiplierless lifting-based architectures for 1-D, 2-D and 3-D CDF 9/7 DWT have been suggested, implemented and verified in chapter 6. In such architectures, the constant multipliers have been replaced with their corresponding proposed shift-add multipliers with a negligible error. Also, low memory requirement and high computation speed have been achieved in the proposed architectures.
6. The proposed 2-D CDF 9/7 DWT lifting-based architecture computes the 2-D DWT coefficients by applying 1-D DWT on each row and column using data blocks of 4×4-pixel. In this architecture, all rows in each data block are fed to corresponding 1- D units concurrently. Thus, a high throughput rate and a short processing time have been achieved. The results have revealed that, a computation time of less than 0.55 ms is enough to compute the 2-D DWT coefficients of 288×352-pixel. Furthermore, at a wordlength of 18-bit an exact data can be recovered from the 2-D DWT
coefficients. Furthermore, the power consumption of the proposed architecture using a frame size of 144×167-pixel is as low as 32 mW for a 20 MHz clock frequency. 7. A 3-D DWT parallel architecture has also been proposed in chapter 6 using a
separable lifting-based scheme for the CDF 9/7 wavelet filter. In this architecture, four input frames are simultaneously used, which reduces the frame buffer to a block memory of four frames only. The results have shown that the proposed 3-D DWT architecture can run at a speed of up to 151 MHz with 4 results/cycle throughput rate. Such a high speed has reduced the 3-D DWT computation time for data size of 144×176×8-pixel to less than 0.33 ms. Furthermore, the shift-add multiplier replacement had a positive impact on power consumption and has provided a high computation speed. As such, the power consumption of the proposed 3-D DWT architecture is 64 mW at 40 MHz operating frequency and 16-bit wordlength. Moreover, the proposed architectures outperform other similar architectures in terms of hardware usage, speed, throughput rate and latency. Furthermore, the proposed 3- D DWT architecture avoids data computation redundancy compared with similar parallel 3-D DWT architectures in the literature. It is worth mentioning that the output accuracy of all the architectures has been tested and verified using different wordlengths and input data sets. Such evaluation processes revealed that the maximum error in the 3-D DWT coefficient for 16-bit wordlength in the proposed architecture was less than 1.49.