DPCM+Huffman coding - Power Efficient Data Compression Hardware for Wearable and Wireless Biome

4.4 Limitations

4.4.1 DPCM+Huffman coding

4.4.1.1 Pre-knowledge of the signal required

The Huffman coding requires the generation of a code book based on the PDF of the original signal. To achieve the best compression result, a large training dataset that can mostly reflect the characteristics of the original signal is required when generating the code book, which makes the compression process inefficient sometimes.

4.4.1.2 Biased code book

A code book generated with a limited number of training datasets will usually not cover all the possible symbols. For instance, the human seizure signal used in this work has a bit width of 12, and after the DPCM, the bit width should become 13-bit, so in theory there will be 213_{possible values for the input signal. However, as is shown in Table 4.2, the code} book of the seizure signal only has 1910 entries, which means only 1910 different symbols were found in the training dataset, and if any value that is not included in this code book occurs afterwards, the compression process might fail. To overcome this problem, all the theoretical symbols of the target signal have to be included into the training dataset to ensure the code book covers all the symbols later in a real signal. The PDF of the signal

4.4 Limitations

Table 4.6: Comparison of the partially covered and fully covered code books

Data type CR of using the partially CR of using the fully

covered code book covered code book

EEG from

1.82 1.71

the healthy people Seizure-free EEG from

2.2 2.04

the epilepsy patients

Seizure 1.51 1.37

Mice EEG data@1000Hz 1.47 1.36

Mice EEG data@200Hz 1.45 1.34

EMG data 2.59 1.9

symbols is altered as a result, and the compression ratio will inevitably deteriorate as the probability distribution is stretched, and the average length of code for representing one symbol is longer than before.

Table 4.6 provides a comparison of the code book generated with and without adding extra symbols in the training datasets. It is quite obvious that the compression ratio decreases when using a code book that covers all the possible signal values. The compression performance of the Huffman coding deteriorates more significantly if the original code book has fewer entries, since adding all the missing symbols brings a bigger change to the PDF of the original signal.

A solution to mitigate this problem is to duplicate the original training datasets sev- eral times before filling them with the extra symbols. This method reduces the overall percentage of the extra symbols, and strengths the PDF of the original training dataset in the new training dataset.

The CRs of using enhanced code books are slightly improved, as shown in Figures 4.5 to 4.10 and Table 4.7.

Table 4.7: Comparison of the code books with and without duplicating the original training datasets

Data type

CR of using the fully CR of using the fully covered

covered code book code book with the original

without duplicating training datasets duplicated 8 times EEG from

1.71 1.77

the healthy people Seizure-free EEG from

2.04 2.11

the epilepsy patients

Seizure 1.37 1.39

Mice EEG data@1000Hz 1.36 1.46

Mice EEG data@200Hz 1.34 1.43

EMG data 1.9 2.44

Figure 4.5: Performance improvement of code book generated with the healthy people’s EEG signal

4.4 Limitations

Figure 4.6: Performance improvement of code book generated with the seizure-free EEG signal

Figure 4.8: Performance improvement of code book generated with the mice EEG signal sampled at 1000 Hz

Figure 4.9: Performance improvement of code book generated with the mice EEG signal sampled at 200 Hz

4.4 Limitations

Figure 4.10: Performance improvement of code book generated with the EMG signal

4.4.1.3 Specified code book

As referred to earlier, there are three different code books trained with human EEG data based on the signals of three different neurological conditions including the signal from the healthy people and the seizure and the seizure-free signals from the epilepsy patients, and each one of the code books is trained by a specific dataset to get the best compression result from the given signal. In reality, the wearable wireless device that has this Huffman compressor implemented could be used by anyone regardless of their neurological conditions. The compression ratio can be much worse if compressing the signal with a code book that is unable to fully reflect the PDF of the input signal.

Table 4.8 shows the compression ratio achieved when using different code books to compress the given data. All the code books listed cover all the possible input symbols like the ones shown in Table 4.6, but they were generated without duplicating the original training dataset.

For instance, when compressing the EEG of healthy people with the code book trained by the seizure signal, the performance of this 2-stage technique is weakened.

There is clearly a set of trade-offs when choosing the code book for implementation as it will be used to encode mixed signals like the signal from an epilepsy patient, which

contains both seizure-free and seizure signals. A code book trained by the EEG data that contains both a seizure-free and seizure signals will inevitably give a lower CR for both types of signal. A typical seizure monitoring was described in [87] and [88]. One subject with epilepsy disease was recorded with 82 second seizure signals during a 70- minute monitoring process, and therefore all the seizure episodes takes around 2% of the whole monitoring time. If using the code books SF-S, S-H and SF-H as given in Table 4.8, for estimation, the overall compression ratios can be estimated by

CRms= (CRsf ∗ tsf+ CRs∗ ts)/tall (4.1) , where CRms, CRsf and CRsare compression ratios of mixed signal, seizure-free signal and seizure signal respectively, tall is the overall observation time, and tsf and ts are the total time with and without the observing of the seizure. Therefore, CRs achieved from these code books are 1.99, 1.86 and 2.05 respectively, and the code book SF-H yields better CR than the other two code books. However, when seizure happens more frequently, the SF-S or S-H might outperform the others.

This set of trade-offs will not be further discussed in this thesis, but it could be an interesting area to look into in the future.

In document Power Efficient Data Compression Hardware for Wearable and Wireless Biomedical Sensing Devices (Page 66-72)