Arithmetic Coding
6.3 What’s bad about dfwld coding and some ways to fix itfix it
6.3.1 Supplying the source word length
In the method of Section 6.1, the encoder must supply the decoder with the length N of the source text, but how is this to be done? If the encoder sends the binary representation of N at the beginning of the code stream, how is the de-coder to know when that representation is finished and the code proper begins?
Some possibilities:
1. If it is known that there will never be more than M source letters in any source text to be dealt with, then you can reservelog2M + 1 bits at the beginning of the code text for transmitting N . If no bound on the source text length is known, you can still use this device by choosing M reasonably large and reservinglog2M + 2 bits at the beginning of the code text; the last bit of these is a “warning bit” which, if set at 1, warns the decoder that the expression for N has overflowed the allotted space and will continue into the next block oflog2M + 2 bits equipped with its warning bit; and so on. The code finally commences when the last warning bit is zero.
The disadvantage of this solution to the problem of supplying N is that it compounds the problem discussed in 6.3.3, below. Presumably the encoder will keep a count of the source letters while encoding. If the encoder is to convey the number N of source letters at the beginning of the code text, then there will be a great delay; the decoder will not even get a peek at the partial code word supplied by rescaling until the encoding is complete. This is all right in those leisurely situations in which the encoded, compressed text is to be stored away and decompressed later, but the other kind of situation is encountered with increasing frequency.
We could convey N or a running count of the source letters encoded in some other location outside the main code stream. However, providing companion locations or parallel streams is inconvenient precisely in those situations when we are in a hurry and hope to decode on the heels of encoding. We will have more to say about this in 6.3.3.
2. The method of Exercise 6.1.2 smoothly communicates the source word length by adding a certain number of zeros onto the code word. The method imposes an extra burden of computation on both the encoder and the decoder, but this disadvantage is not as important as the fact that this method appears to be incompatible with solutions to the problem addressed in 6.3.3; how can decoding proceed on the heels of encoding if the decoder does not know whether a string of zeros in the code stream is part of the regular code word or part of the extra zeros at the end? This problem could be dealt with by providing a marker of, or a pointer to, the end of the regular code word, but this would again raise the technical difficulty of supplying an extra location or stream of information outside the main code stream.
3. The algorithm of Section 6.4, yet to come, eliminates the necessity of counting the source letters, at the cost of introducing an extra source letter, usually called EOF, for “end of file.” This extra letter will be used once, to
162 6 Arithmetic Coding
mark the end of the source text.
The disadvantage in this trick is that compression will be somewhat less than optimal, not so much because of the extra bits required for EOF as because the original relative source frequencies will have to be trimmed a bit to make room for the small relative frequency to be assigned to EOF. However, the entropy of the modified source can be made as close as desired to the original entropy by making fEOF sufficiently small (because x log2x is a continuous function of x> 0 and x log2x→ 0 as x ↓ 0). Therefore, by analysis similar to that in the proof of Theorem 6.2.1 and the remarks following, you can get within a cat’s whisker of optimal lossless compression, on the average, for long source texts, using EOF and the algorithm of Section 6.4.
6.3.2 Computation
The arithmetic coding method of Section 6.1 requires exact computations, both in encoding and decoding. These are costly, especially the multiplications. The length of A(i1,...,iN) isN
j=1 fij, which on the average requires around N times the number of bits to store (never mind compute) as the average number of bits per number required to store the (rational) numbers f1,..., fm.
Rescaling and the underflow expansion may appear to relieve the burden of exact computation. However, note that these operations involve multiplying the interval lengths by powers of two. Therefore, odd factors of the denominators of the fj are never reduced by rescaling, and, if bigger than one, will cause the complexity of and storage space required for exact computations to grow inexorably, approximately linearly with the number of source letters. This ob-servation inspires the first of three suggestions for lessening the burden of exact computation.
Replace the fj by approximations which are dyadic fractions. For exam-ple, if m= 4 and f1= .4, f2= .3, f3= .2, and f4= .1, you could take f1=102256,
f2=25677, f3=25651, and f4=25626 , these being the closest (by most definitions of closeness) dyadic fractions with common denominator 256 to the actual values of f1, f2, f3, and f4. It may appear that replacing the fj in this case by these approximations will actually increase the burden of computation, because the approximations are nastier-looking fractions than the original fj, and this is in-deed a consideration; we could make life easier if we replace the fj by dyadic fraction approximations with denominator 8 or 16—but then our approxima-tions would not be very close to the true relative frequencies, and that might affect compression deleteriously. [It can be shown, by analysis similar to that in the proof of Theorem 6.2.1, that as approximate relative frequencies tend to the true relative frequencies, the average number of bits per source letter in code resulting from arithmetic coding of source words of length N , using the approx-imate relative frequencies, will eventually be bounded above by H(S) +1+N , for any > 0, where H (S) is the true source entropy. Thus good
approxima-tions ensure good approaches to optimal lossless encoding; however, not much is known about the penalty to be paid for bad approximations.]
Encode blocks of source letters of a certain fixed length. After each block is encoded, the encoder starts over on the next block. The computational ad-vantage is that the denominators of the rational numbers that give the interval endpoints and lengths cannot grow without bound, even if the relative source frequencies are not dyadic fractions, since the calculations start over periodi-cally.3
But how will the decoder know where the code for one block ends and the code for the next block begins? If an efficient method of providing non-binary markers or a parallel pointer/counter stream outside the main code stream is ever devised, this might be a good place to use it. In the absence of any such technical convenience, we could use a modification of the method of Section 6.4, with the artificial source letter EOB for “end of block” to be inserted by the encoder into the source text at the end of each block. Of course, this device costs something in diminished compression. The longer the blocks, the less the cost of EOB, but the greater the cost of computation.
Use approximate arithmetic. This third suggestion for avoiding computa-tional arthritis in arithmetic coding is the method actually used in the proposed implementation in Section 6.4. The interval[0,1) is replaced by an “interval” of consecutive integers,{0,..., M − 1}, which we will continue to denote [0, M), and in subdivisions of this interval the source words are allocated blocks of consecutive integers approximately as they would be allocated in pure dfwld arithmetic coding using the full interval of real numbers from 0 to M. Thus, if f1= .4, f2= .3, f3= .2, f4= .1, and M = 16, then A(1) = A(s1) = {0,1,2,3, 4,5} = [0,6), A(2) = A(s2) = {6,7,8,9,10} = [6,11), etc. Are you worried that subsequent subdivision will shrink the intervals to lengths less than one so that they may fail to contain any integers at all? Well may you worry! This un-pleasant possibility is taken care of by starting with M sufficiently large, with respect to the relative source frequencies, and by rescaling and applying the underflow expansion.
This trick solves the problem of exact computation by simply doing away with exact computation. The disadvantage lies in the level of compression achievable. This disadvantage has been considered in [8, 33, 48], but there is room for further analysis. Some experimental results comparing pure dfwld
3You might observe that block arithmetic encoding amounts to using an encoding scheme for SN, where N is the length of the source blocks. This encoding scheme definitely does not satisfy the prefix condition; for instance, the single digit 1 is the code word representative of some member of SN, and is also the first digit of the code representatives of a great many others.
However, the luxury of instantaneous decoding available with a prefix-condition encoding scheme for SNis illusory. If|S| is fairly large, say |S| = 256, and N is fairly hefty, say N = 10, then an encoding scheme for SNwould have a huge number,|SN| = |S|N, of lines; 25610= 280 is an unmanageable number of registers necessary to store an encoding scheme. So a nice prefix-condition scheme for SNis of no practical value in any case. You can think of the decoding process in block arithmetic coding as a clever and relatively efficient way of looking up code words in an encoding scheme without having actually to store the scheme.
164 6 Arithmetic Coding
arithmetic encoding involving exact computation with slapdash integer-interval methods of the type in Section 6.4 appears in [51].
Note that if M is a multiple, or, better yet, a power, of the common denom-inator of f1,..., fm, then interval subdivision of the blocks of integers substi-tuting for real intervals in this form of arithmetic coding is sometimes exact. In practice, M is usually taken to be a power of 2, M= 2K. It might be a shrewd move in these cases to replace the fj by dyadic fraction approximations with common denominator 2k, with k being an integer divisor of K . (Of course, it is a luxury to know the fj beforehand. In adaptive arithmetic coding, to be described inChapter 8, the fj are changing as the source is processed, and it is not convenient to repeatedly replace them by approximations.)