Huffman coding for stationary Markov sources

finding a vector in the null space (kernel) of M − I. Using row reduction we obtain

 we divide by 8 to give the vector of equilibrium probabilities p = 1

NOTE: In the exam we will give you the vector p. You then need only to check that pis a probability vector and Mp = p.

3.9 Huffman coding for stationary Markov sources

We construct q possibly different Huffman codes, one for each previous symbol sj, based on the probabilities that s1,· · · , s^q follow sj. Recall that

Remembering to re-order so probabilities are non-increasing, construct a code Huff_(j) for this situation having average length L(j).

We assume that the Markov source is in equilibrium, so sj occurs with probability pj. Therefore the code Huff(j) will be used with probability pj, and the overall average length of codeword will be

LM = p1L(1)+ p2L(2)+· · · + p^qL(q).

This is the average length for the Markov Huffman code. (We write HuffM to denote this code.)

This ignores the code for the first symbol since it is not preceded by any other sym-bol. The first symbol is normally encoded using the Huffman code for the equilibrium probabilities, denoted HuffE, of average length LE.

The above formula for LM is then not strictly correct, but if a large number of source symbols are encoded the effect of the slightly different situation for the first will be negli-gible and so we ignore it in LM calculations.

Example 3.13 Using the above M =



0.3 0.1 0.10 0.5 0.1 0.55 0.2 0.8 0.35



 with equilibrium p = 1

8(1, 3, 4), we have equilibrium Huffman code

Huff_E prob× 8 code

s1 1 01

s2 3 00

s3 4 1

with LE = 1

8(2× 1 + 2 × 3 + 1 × 4) = 1.5.

The Huffman code for symbols following the symbol s₁ is Huff₍₁₎

s₁ 0.3 00

s2 0.5 1

s3 0.2 01

with L(1) = 2× 0.3 + 1 × 0.5 + 2 × 0.2 = 1.5

The Huffman code for symbols following the symbol s₂ is Huff(2)

s1 0.1 10

s2 0.1 11

s₃ 0.8 0

with L₍₂₎ = 1.2.

Finally, the Huffman code for symbols following the symbol s3 is Huff(3) (i.e. probabilities of following s3)

s1 0.1 11

s2 0.55 0

s3 0.35 10

with L(3) = 1.45.

Then the average length of the Markov Huffman code (ignoring the “first symbol effect”) is

LM = 1

8× 1.5 + 3

8 × 1.2 + 1

2× 1.45

≈ 1.36

So LM < LE as expected, since LE is the average length if you ignore the Markov dependencies among the source symbols.

Now, if we simply used a binary block code to encode this example, ignoring the source probabilities completely, we would need 2 bits per symbol and so

Lblock = 2 which is considerably worse than either LE or LM.

3.9. HUFFMAN CODING FOR STATIONARY MARKOV SOURCES 63 Hence using the variable length code HuffE compresses the message length by a factor

of LE

Lblock

= 1.5

2 = 0.75, that is, 25% saved.

Using HuffM compresses by 1.36

2 = 0.68, that is, 32% saved.

Even if we ignore the source probabilities completely and assume that each source symbol is equally likely, then we obtain

3 1

s₂ 1

3 00

s₃ 1

3 01

which gives

L = 1.67.

Here the compression factor is 1.67

2 = 0.83, that is, 17% saved.

Example 3.14 Finally, here is an example of using HuffM to encode with the above M.

The message s1s2s3s3s2s1s2 gets coded via

s1 first, so use HuffE giving 01

s2 after s1 so use Huff(1) 1

s3 s2 Huff(2) 0

s3 s3 Huff(3) 10

s2 s3 Huff(3) 0

s₁ s₂ Huff₍₂₎ 10

s2 s1 Huff(1) 1

so this is encoded as 0110100101 . Here length

7 = 10

7 = 1.43 is the code length per symbol.

Since each of the Huffman codes used is an I-code, any valid encoding will decode uniquely.

Example 3.15 We decode by running the above in reverse, i.e. 1st letter using HuffE as si1 then 2nd using Huff_(i1) as si2 3rd using Huff_(i2) etc.

first using HuffE 01 is first codeword and decodes as s1

next using Huff(1) 1 is next codeword and decodes as s2

Huff₍₂₎ 0 s₃

Huff(3) 10 s3

Huff(3) 0 s2

Huff₍₂₎ 10 s₁

Huff(1) 1 s2

3.10 Other Text Compression Methods

There are many ways of approaching text compression and many considerations: for ex-ample, compression efficiency, speed, memory use, uncompression speed, memory use.

The model of the source symbols is also important. Some methods use a static model where the probabilities of each source symbol are fixed. Others are adaptive where the probabilities are updated after each symbol is received, to reflect their relative frequency.

We discuss two methods.

3.10.1 Arithmetic Coding

Arithmetic coding is a form of compression coding which encodes the message into a rational number between 0 and 1. It is very efficient and approaches the entropy limit faster than Huffman. This is not a contradiction because arithmetic coding is not a UD-code. The reason it is so efficient is that it does not need to use an integral number of bits for each symbol of the message. Huffman is more commonly used as arithmetic coding is more computationally expensive to implement. Arithmetic coding is also subject to patents.

The idea is to assign a subinterval of [0, 1)⊆ R to the message and successively narrow this subinterval down as each symbol is encoded.

The message must end with a stop symbol•, and after the subinterval corresponding to the message plus • is found, then any suitable single number in the subinterval is transmitted — this is the actual code number or codeword.

Encoding:

Firstly, to each symbol is associated a subinterval of [0, 1) whose length is proportional to the relative frequency of the symbol.

These subintervals are chosen so they do not overlap, but fill [0, 1).

If the “message so far” subinterval is [s, s + w) where s is the start, w is the width and the next symbol has associated subinterval [s^′, s^′+ w^′)

then the new message subinterval is the [s^′, s^′+ w^′) part of [s, s + w).

That is, the new message subinterval becomes

[ s + s^′w, (s + s^′w) + w^′w ) Decoding:

We reverse the above, by finding which of the symbol subintervals contains the code number.

This then gives the first symbol of the message.

The code number is then re-scaled to that interval as follows. If x is the code number and it lies in the subinterval [s, s + w), then the rescaled code number is (x− s)/w.

The process is repeated until the stop symbol • is encountered.

Example 3.16 Suppose we have a fixed probability model

probability subinterval start width

s₁ 0.4 [0, .4) 0 .4

s2 0.2 [.4, .6) .4 .2

s3 0.2 [.6, .8) .6 .2

s₄ 0.1 [.8, .9) .8 .1

• 0.1 [.9, 1) .9 .1

3.10. OTHER TEXT COMPRESSION METHODS 65 We wish to encode the message s1s2s1s4s2s1s3s3s1•

Encoding proceeds as follows:

subinterval start width

begin 0 1

s1 0 + 0 = 0 .4× 1 = .4

s2 0 + .4× .4 = .16 .2× .4 = .08

s1 .16 + 0× .08 = .16 .4× .08 = .032

s₄ .16 + .8× .032 = .1856 .1× .032 = .0032 s2 .1856 + .4× .0032 = .18688 .2× .0032 = .00064 s1 .18688 + 0× .00064 = .18688 .4× .00064 = 2.56 × 10⁻⁴ s₃ ” + .6× 2.56 × 10⁻⁴= .1870336 .2× ” = 5.12 × 10⁻⁵ s3 ” + .6× 5.12 × 10⁻⁵ = .18706432 .2× ”1.024 × 10⁻⁵ s1 ” + 0× 1.024 × 10⁻⁵ = .18706432 .4× ” = 4.096 × 10⁻⁶

• ” + .9× 4.096 × 10⁻⁶ = .187068006 .1× ” = 4.096 × 10⁻⁷

The message subinterval is [.187068006, .187068415) and we transmit say .1870681, which is the shortest decimal in this subinterval, as the code number or codeword. Using a Huffman coding for the same source probabilities would give a 22 bit message.

Decoding proceeds as follows:

Example 3.17

code number rescaled in interval so symbol

.1870681 [0, .4) s1

(.1870681− 0)/.4 = .46767025 [.4, .6) s₂ (.46767025− .4)/.2 = .33835125 [0, .4) s1

(.33835125− 0)/.4 = .845878125 [.8, .9) s4

(”− .8)/.1 = .45878125 [.4, .6) s₂ (”− .4)/.2 = .29390625 [0, .4) s1

(”− 0)/.4 = .734765625 [.6, .8) s3

(”− .6)/.2 = .673828125 [.6, .8) s₃ (”− .6)/.2 = .369140626 [0, .4) s1

(”− 0)/.4 = .922851562 [.9, 1) •

so stop Note:

1. Without the stop symbol, decoding would usually go on forever.

2. It is possible to implement arithmetic coding and decoding using integer arithmetic, and if we use binary representation for the subintervals then it can actually be done using binary arithmetic. Successive digits can be transmitted as soon as they are finalized; that is, when both start and end points of the interval agree to sufficiently many significant places. Decoding can also be done as the digits arrive. Hence arithmetic coding does not need much memory.

3. We have described a static version of arithmetic coding. But the process can also be made adaptive by continually adjusting the start/width values for symbols while decoding takes place.

3.10.2 Dictionary methods

In these methods, a dictionary is maintained and parts of the text are replaced by pointers to the appropriate dictionary entry. Examples are Ziv and Lempel’s algorithms LZ77 and LZ78 and Welch’s improvement, called the LZW algorithm. These compression algorithms are used in gzip, gif and postscript. It is known that LZ78 and LZW approach the entropy limit for very long messages. But it approaches the limit so slowly that in practice it never gets close.

The LZ77 algorithm uses a system of pointers looking back in the data stream: the LZ78 algorithm creates a system of pointers (a dictionary) looking forward in the data stream. One drawback with LZ77 is that it takes a lot of time to encode, as it involves a lot of comparisons: decoding is quick though.

We describe the LZ78 algorithm for encoding a message m.

Encoding: start with an empty dictionary and set r = m. (Here r is the part of the message which we have not yet encoded.) Let “+” denote concatenation of symbols.

Repeat the following steps until r is empty:

(i) Let s be the longest prefix of r which corresponds to a dictionary entry. If there is no such prefix then set s =∅. (By prefix we mean a subsequence of r which begins at the first symbol.)

(ii) Suppose that s is in line ℓ of the dictionary, and put ℓ = 0 if s = ∅. Add a new dictionary entry for s + c, where c is the next symbol after s in r. Output (ℓ, c) to the encoding and delete s + c from r.

Example 3.18 Suppose we wish to encode the message m = abbcbcababcaa.

r s ℓ new dictionary entry output

abbcbcababcaa ∅ 0 1. a (0, a)

bbcbcababcaa ∅ 0 2. b (0, b)

bcbcababcaa b 2 3. bc (2, c)

bcababcaa bc 3 4. bca (3, a)

babcaa b 2 5. ba (2, a)

bcaa bca 4 6. bcaa (4, a)

The output is (0, a)(0, b)(2, c)(3, a)(2, a)(4, a). Ignoring brackets this has 12 symbols con-sisting of 6 numbers and 6 letters whereas the original message has 13 symbols, all letters.

Much better compression rates are achieved when encoding long texts, or data streams with a lot of repetition, as the dictionary entries themselves become longer.

Decoding: start with an empty dictionary. If the next code pair is (ℓ, c) then output D(ℓ) + c to the message, where D(ℓ) denotes entry ℓ of the dictionary and D(0) = ∅.

Also add D(ℓ) + c to the dictionary. Repeat until all code pairs have been processed.

The dictionary after the decoding process is exactly the same as the dictionary after the encoding process.

3.11. OTHER TYPES OF COMPRESSION 67

In document MATH3411 2015 S2 Lecture Notes UNSW (Page 67-73)