3.2 Methodology
3.2.1 Data compression technique candidates
Many signal acquisition devices are used in clinical research, and therefore only the lossless data compression technique that can fully preserve the original information of the signal will be considered in this research, as it satisfies the strict requirements of clinical uses, and techniques like transform based algorithms will not be discussed. A performance metric that contains the CR and the power consumption of the compressor is introduced for evaluating the existing compression technique, and developing the new technique as the ultimate purpose of introducing a data compressor is to reduce the system power.
3.2 Methodology
3.2.1.1 DPCM + Huffman coding
A data compression process that combines DPCM and the Huffman coding is selected for further analysis for the following reasons.
a. Computational simplicity
There are several lossless data compression techniques that can be considered for fur- ther analysis based on the review given in chapter 2, and they are Huffman coding, Arith- metic coding, KLT, and 2-stage encoding techniques including LZ77+Huffman coding and DPCM+Huffman coding. Most of these techniques are time-domain based except for the KLT, which is transform based. Given the fact that the KLT certainly requires more computational resource whilst unable to deliver a significantly higher CR when compress- ing the EEG signal than the time-domain based techniques [25], it will not be taken into the next stage of the research. As a result, only the DPCM+Huffman coding with the other mentioned time-domain based techniques fit the purpose regarding the aspect of computational simplicity.
b. Hardware efficiency
The arithmetic coding and the Huffman coding are two well-known time-domain based entropy coding techniques, and as indicated earlier, both of them require comparatively low computational resource. As introduced earlier in chapter 2, they have similar work schemes, and both algorithms can achieve the data compression via assigning a shorter code to symbol(s) based on the PDF of the signal. The Huffman coding assigns a unique binary code to each possible symbol, and the arithmetic coder encodes multiple symbols into one single and unique binary fraction, and both assigned codes are shorter than the original symbols.
In [26] and [25], it is shown that the arithmetic coding yields a slightly better CR than the Huffman coding does, but [26] also indicates a power consumption of 41mW from the arithmetic encoder when compressing the 16-bit EEG signal, which is even higher than the power that a wireless transceiver needs to send uncompressed data, and according to equations (2.19) and (2.20), the power would be 8.65mW if transmitting 24 channels of EEG signal with nRF8001 [56]. Therefore, the arithmetic coding is not suitable for this research in respect of the hardware efficiency.
c. Promising data compression performance
The 2-stage technique LZ77+Huffman coding is ruled out since this process performs even worse than the Huffman coding alone according to [24], and obviously it also consumes
more computational resource.
The Huffman coding itself gives an acceptable performance when compressing the EEG signal as it is shown in Table 2.1. Moreover, with a DPCM encoder added before the Huffman encoder to remove short-term redundancies, the Huffman coding reached the highest CR in [25] among all the other time-domain based techniques. The performance of the DPCM+Huffman coding is even close to the KLT.
Due to above reasons, the DPCM+Huffman coding technique is chosen as a possi- ble data compressor that can be used on the simplified hardware model of the signal recording system, and since little work has been conducted on the power analysis of the DPCM+Huffman coding, this technique will be selected for a further test.
3.2.1.2 Log2 sub-band encoding - a new algorithm
All entropy coding techniques can only conduct compression with the prior knowledge of the PDF of the original signal, and the Huffman coding inevitably suffers from this limitation. The Huffman coding requires a pre-generated code book which contains all the possible symbols it may encounter and all the corresponding Huffman codes of these symbols. The code book can be easily generated with a training dataset, however when compressing biomedical signals, some unexpected symbols sometimes occur, and they are usually caused by sudden changes within the physiological system such as seizure spikes detected in EEG signals or by artifacts from other sources.
The code book is always a compromise-relying on typical cases to achieve gains, and it contains all occurred symbols included in the training dataset, but it not always covers all the symbols in the dataset to be compressed.
To overcome some of these limitations, a novel technique called Log2 sub-band is developed for this research, based loosely on the principles of the DPCM+Huffman coding but with considerable simplification.
3.2 Methodology
Figure 3.2: Work scheme of the Log2 sub-band compression technique
In general, the Log2 sub-band divides each binary data sample into several parts, and each part may be a few bits long. The technique then compares each part of the current sample with the part in the previous sample. The comparison process goes on part by part until one part in the current data sample is found different from the part in previous one. In the end, only the part that is different and parts after that different part will be transmitted or stored. To reconstruct the data, a header is needed to indicate how many parts of the current data sample are transmitted or stored.
A more specific example of the Log2 sub-band is illustrated in Figure 3.2, and in this case assumes the original biomedical data is digitized into 12-bit, and the Log2 sub-band first divides each signal sample into three 4-bit segments(nibbles). In this case, each nibble then is compared with the same part of the previous data sample, and instead of transmitting or storing the whole current sample, only the nibbles that are different from the previous sample0s are transmitted or stored. A header will be added to each compressed sample to indicate the number of nibbles transmitted or stored, and it could be 3 nibbles, 2 nibbles, 1 nibble, or, as in this case none, so a 2-bit header is added at the end of the comparison process. Schemes can be devised for any bit width, and each band can have arbitrary bit width, for instance Figure 3.2 presents a{4, 4, 4} scheme, and it could also be {3, 4, 5} with a 2-bit header or {3, 3, 3, 3} with a 3-bit header for 12-bit data.
ilarities of the DPCM and the Huffman coding into a single stage process. Unlike the Huffman coding, the Log2 sub-band does not require the PDF of the original signal to generate a code book before compressing, so it will not be interrupted if any new symbols show up. Compared to the DPCM+Huffman coding, the Log2 sub-band is expected to be more adaptive to biomedical signals. A detailed compression result comparison will be given in the next chapter.