It seems logical that as the order of the model increases, compression ratios ought to improve as well. The probability of the letter u appearing in the text of this book may only be 5 percent, for example, but if the previous context character is q, the probability goes up to 95 percent. Predicting characters with high probability lowers the number of bits needed, and larger contexts ought to let us make better predictions.
Unfortunately, as the order of the model increases linearly, the memory consumed by the model increases exponentially. With an order 0 model, the space consumed by the statistics could be as small as 256 bytes. Once the order of the model increases to 2 or 3, even the most cleverly designed models will consume hundreds of kilobytes.
The conventional way of compressing data is to make a pass over the symbols to gather statistics for the model. Then a second pass is made to actually encode the data. The statistics are usually carried with the compressed data so the decoder will have a copy. This approach obviously has serious problems if the statistics for the model take more space than the data to be compressed.
Adaptive compression is the solution to this problem. In adaptive data compression, both the compressor and the decompressor start with the same model. The compressor encodes a symbol using the existing model, then it updates the model to account for the new symbol using the existing model, then it updates the model to account for the new symbol. The decompressor likewise decodes a symbol using the existing model, then it updates the model. As long as the algorithm to update the model operates identically for the compressor and the decompressor, the process can operate
perfectly without needing to pass a statistics table from the compressor to the decompressor. Adaptive data compression has a slight disadvantage in that it starts compressing with less than optimal statistics. By subtracting the cost of transmitting the statistics with the compressed data, however, an adaptive algorithm will usually perform better than a fixed statistical model.
Adaptive compression also suffers in the cost of updating the model. When updating the count for a particular symbol using arithmetic coding, for example, the update code has the potential cost of updating the cumulative counts for all other symbols as well, leading to code that on the average performs 128 arithmetic operations for every symbol encoded or decoded, using the modeling techniques needed for arithmetic coding.
Because of high cost in both memory and CPU time, higher-order adaptive models have only become practical in perhaps the last ten years. It is ironic that as the cost of disk space and memory goes down, the cost of compressing the data stored there also goes down. As these costs continue to decline, we will be able to implement even more effective programs than are practical today.
A Simple Example
The sample program in Chapter 4 used Huffman coding to demonstrate adaptive compression. In this chapter, the sample program will use adaptive arithmetic coding. When performing finite-context modeling, we need a data structure to describe each context used while compressing the data. If we
move up from an order to an order-1, for example, we will use the previous symbol as a context for encoding the current symbol.
An array of 256 context arrays is probably the simplest way to create the data structures for an order- 1 model. As we saw in the last chapter, a simple context model for an arithmetic encoder can be created using an array of cumulative counts for each symbol. If we have 256 symbols in our alphabet, an array of pointers to 256 different context arrays can be created like this:
int *totals[ 256 ]; void initialize_model() {
int context; int i;
for (context= 0 ; context < END_OF_STREAM ; context++ ) { totals[ context ] = (int *) calloc( END_OF_STREAM + 2, sizeof( int ) );
if ( totals[ context ] == NULL )
fatal_error( "Failure allocating context %d", context ); for ( i = 0 ; i <= ( END_OF_STREAM + 1 ) ; i++ )
totals[ context ][ i ] = 1; }
}
This code not only creates the 256 context arrays, it also initializes each symbol’s count to 1. At this point, we can begin encoding symbols as they come in. The loop for encoding the symbols looks similar to the one used for other adaptive programs. Here is an order 1 arithmetic compression loop: for ( ; ; ) {
c = getc( input ); if (c == EOF )
c = END_OF_STREAM;
convert_int_to_symbol( c, context, &s ); encode_symbol( output, &s );
if ( c == END_OF_STREAM ) break;
update_model( c, context ); context = c;
}
This works fairly simply. Instead of just having a single context table, like the code in chapter 5, we now have a set of 256 context tables. Every symbol is encoded using the context table from the previously seen symbol, and only the statistics for the selected context get updated after the symbol is seen. This means we can now more accurately predict the probability of a character’s appearance. The decoding process for this order 1 code is also very simple, and it looks similar to the decoding example from chapter 5. Here is the order 1 expansion loop:
for ( ; ; ) {
get_symbol_scale( context, &s ); count = get_current_count( &s );
c = convert_symbol_to_int( count, context, &s ); remove_symbol_from_stream( input, &s );
if (c == END_OF_STREAM ) break;
putc( (char) c, output ); update_model( c, context ); context = c;
The only difference between this and conventional order-0 code is the addition of the context variable, both within the loop and as a parameter to other functions. The remaining routines that differ from the code in Chapter 5 are are shown next. The C source for this module is included on the program disk.
void update_model( int symbol, int context ) int i;
for ( i = symbol + 1 ; i <= ( END_OF_STREAM + 1 ) ; i++ ) totals[ context ][ i ]++;
if ( totals[ context ][ END_OF_STREAM + 1 ] < MAXIMUM_SCALE ) return;
for ( i = 1 ; i <= ( END_OF_STREAM + 1 ) ; i++ ) { totals[ context ][ i ] /= 2;
if ( totals[ context ][ i ] <= totals[ context ][ i - 1 ] ) totals[ context ][ i ] = totals[ context ][ i - 1 ] + 1; }
}
void convert_int_to_symbol( int c, int context, SYMBOL *s ) {
s->scale = totals[ context ][ END_OF_STREAM + ]; s->low_count = totals[ context ][ c ];
s->high_count = totals[ context ][ c + 1 ]; }
void get_symbol_scale( int context, SYMBOL *s ) {
s->scale = totals[ context][ END_OF_STREAM + 1 ]; }
int convert_symbol_to_int( int count, int context, SYMBOL *s) {
int c;
for ( c = 0; count >= totals[ context ][ c + 1 ] ; c++ ) ;
s->high_count = totals[ context ][ c + 1 ]; s->low_count = totals[ context ][ c ]; return( c );
}
Using the Escape Code as a Fallback
The simple order-1 program does in fact do a creditable job of compression, but it has a couple of problems to address. First, the model for this program makes it a slow starter. Every context starts off with 257 symbols initialized to a single count, meaning every symbol starts off being encoded in roughly eight bits. As new symbols are added to the table, they will gradually begin to be encoded in fewer bits. This process, however, will not happen very quickly.
For the context table for the letter q, for example, we will probably see a a very high number of u symbols. The very first u will have a probability of 1/257, and will accordingly be encoded in eight bits. The second u will have a probability of 2/258, but will still require over seven bits to encode. In fact, it will take sixteen consecutive u symbols with no other appearances before the entropy of the symbol is reduced to even four bits.
The reason for this slow reduction in bit count is obvious. The probability of the u symbol is being weighted down by the other 256 symbols in the table. Though they may never appear in the message, they need a nonzero count. If their count were reduced to zero, we would not be able to encode them if and when they appeared in the message.
There is a solution to this problem, however, and it is relatively painless. Instead of having every symbol appear automatically in every table, start off with a nearly empty table and add symbols to the table only as they appear. The q table would have zero counts for all the other symbols, giving the first u that appears a low bit count.
But there is a catch here. If a symbol doesn’t appear in a context table, how will it be encoded when it appears in a message? The easiest way to accomplish this is to use an escape code. The escape code is a special symbol (much like the end-of-stream symbol) that indicates we need to “escape” from the current context.
When a context issues an escape symbol, we generally fall back to a lower-order context. In our next sample program, we escape to the escape context, a context that never gets updated. It contains 258 symbols, each of which has a count of 1. This guarantees that any symbol encountered in the
message can be encoded by outputting an escape code from the current context and by encoding the symbol using the escape context.
How does this affect the example used for the letter u? As it turns out, it makes an enormous difference. The first u symbol that took eight bits in the previous example will take about eight bits here as well. The escape code takes no bits to encode, and in the escape context the u has a 1/257 probability. After that, however, the u is added to the table and given a count of 1. The next
appearance of u will require only one bit to encode, since it has a probability of 1/2. By the time 16 u’s have appeared, and while the previous model is still taking four bits to encode it, the escape- driven model will take .06 bits!
The escape code frees us from burdening our models with characters that may never appear. This lets the model adjust rapidly to changing probabilities and quickly reduces the number of bits needed to encode high- probability symbols.
The encoding process for this particular implementation of a multi-order model requires only a few modifications to the previous program. The convert_int_to_symbol() routine now has to check whether a symbol is present in the given context. If not, the escape code is encoded instead, and the function returns the appropriate result to the main encoding loop, as shown:
context = 0; initialize_model(); initialize_arithmetic_encoder(); for ( ; ; ) { c = getc( input ); if ( c == EOF ) c = END_OF_STREAM;
escaped = convert_int_to_symbol( c, context, &s ); encode_symbol( output, &s );
if ( escaped ) {
convert_int_to_symbol( c, ESCAPE, &s ); encode_symbol( output, &s );
} if ( c == END_OF_STREAM ) break; update_model( c, context ); context = c; }
In the main compression loop shown, the compressor first tries to send the original symbol. If the convert_int_to_symbol() routine returns a true, the symbol did not appear in the current context, and the routine resends the symbol using the escape context. We update just the current context model with the symbol just sent, not the escape model.
The decompression loop for this program follows a similar pattern. The code shown next makes one or two possible passes through the loop, depending on whether an escape code is detected. The program for this order-1 context-switching program is on the program diskette that accompanies this book. context = 0; initialize_model(); initialize_arithmetic_decoder( input ); for ( ; ; ) { last_context = context; do {
get_symbol_scale( context, &s ); count = get_current_count( &s );
c = convert_symbol_to_int( count, context, &s ); remove_symbol_from_stream( input, &s );
context = c;
} while ( c == ESCAPE ); if ( c == END_OF_STREAM ) break;
putc( (char) c, output );
update_model( c, last_context ); context = c;
}
Improvements
Some problems with the method of encoding in ARITH-1.C are the high-cost operations associated with the model. Each time we update the counts for symbol c, every count in totals[context][] from c up to 256 has to be incremented. An average of 128 increment operations have to be performed for every character encoded or decoded. For a simple demonstration program like the one shown here, this may not be a major problem, but a production program should be modified to be more efficient. One way to reduce the number of increment operations is to move the counts for the most frequently accessed symbols to the top of the array. This makes the model keep track of each symbol’s position in the totals[context] array, but it reduces the number of increment operations by an order of
magnitude. This is a relatively simple enhancement to make to this program. A very good example of a program that uses this technique has been published as part of the paper by Ian H. Witten, Neal Radford, and John Cleary, “Arithmetic Coding for Data Compression,” Communications of the ACM (June 1987). This paper is an excellent source of information regarding arithmetic coding, with some sample C source code illustrating the text.