The validity of Huffman’s algorithm - Average code word length and Huffman’s algorithm

Coding Theory

4.3 Average code word length and Huffman’s algorithm

4.3.1 The validity of Huffman’s algorithm

In this section we will try to lead whomever is interested through a proof of the validity of Huffman’s algorithm. In fact, we will prove more: not only does Huffman’s algorithm always give a “right answer,” but, also, every “right answer,” in case there is more than one, as in problem 4, above, can be obtained by some instance of Huffman’s algorithm. By a “right answer” here we do not mean any actual prefix-condition encoding scheme which minimizes ¯, but rather the sequence of lengths of the code words in such an encoding scheme.

(That Huffman’s algorithm always produces a prefix-condition scheme is quite easy to see; we leave it to the reader to work through the proof.)

There is a concise proof of the validity of Huffman’s algorithm in the binary case, in Huffman’s original paper [36], and this proof can be easily extended to prove the stronger statement given here, when n= 2. However, there are some unexpected difficulties that crop up when n> 2 that appear to necessitate a much longer proof. We have not seen a proof for n> 2 elsewhere. Both Huffman [36] and Welsh [81] give proofs for n= 2 and dismiss the cases n > 2 as similar. Jones [37] notes that the case n> 2 is significantly different from the case n= 2 but does not give a proof for n > 2.

Thanks are due to Luc Teirlinck for several of the observations on which the proof given here is based. Even more thanks are due to Heather-Jean Matheson, who, while an undergraduate at the University of Prince Edward Island, discov-ered a serious error in the purported proof in the first edition of this text. (She not only noticed that the logic of a certain inference was wrong, she demonstrated that it could not be made right, by giving a beautiful example. Unfortunately, it would take us too far afield to explain that example here.) Yet further portions of gratitude are due to Maxim Burke for elegantly fixing the error, in a way that improves the entire proof. The statements and proofs of Propositions 4.3.8 and 4.3.9 are entirely due to him.

Recall, from Exercise 4.2.3, that an n-ary Huffman sequence is a sequence

1≤ ··· ≤ mof positive integers such that there is a prefix-condition encoding scheme sj → wj ∈ A^j, j= 1,...,n, for encoding an m-letter source alphabet S with an n-letter code alphabet A, minimal in the sense that if any of thej is reduced by one and the new sequence is denoted₁,...,m, there is no prefix-condition scheme sj → w_j∈ A^j, j= 1,...,m. [Convention: A⁰= ∅.]

Notice that, given relative source frequencies f1≥ ··· ≥ fm > 0, any se-quence1≤ ··· ≤ mof code word lengths for a prefix-condition scheme sj→ wj∈ A^j, j= 1,...,m, which minimizes ¯ =m

j=1 fjj is an n-ary Huffman

4.3 Average code word length and Huffman’s algorithm 87

sequence. (Why? In fact, the converse is true, as well: every n-ary Huffman sequence is the sequence of code word lengths in a prefix-condition scheme for S→ A that minimizes ¯ with respect to some sequence f1,..., fm of relative source frequencies. SeeExercise 4.3.6. But we will not need this fact here.)

By Kraft’s Theorem (Theorem 4.2.6), a sequence1≤ ··· ≤ mof positive integers is an n-ary Huffman sequence if and only if it is minimal with respect to satisfying Kraft’s Inequality,m

j=1n⁻^j ≤ 1. Since diminishing the largest of thej increases the sum_m

j=1n⁻^j the least, it follows that1≤ ··· ≤ mis an n-ary Huffman sequence if and only if

m K is a multiple of n. Since K ≥ 1, this establishes the last conclusion of the proposition, thati = L for the last n values of i. It remains to be shown that m= n + k(n − 1).

By the induction hypothesis, m+a = n +k(n −1) for some non-negative inte-ger k. Thus m= m+ an = n + (k+ a)(n − 1), which has the desired form.

4.3.9 Proposition If1≤ 1≤ ··· ≤ m= Lis ann-ary Huffman sequence and m= n + k(n − 1) + t, wherekis a non-negative integer and1≤ t ≤ n − 1, then

i=1n⁻ⁱ+ⁿ^−1−t_nL = 1, andm−t= ··· = m= L.

Proof: The second conclusion follows from the first and Proposition 4.3.8, ap-plied to the longer sequence1≤ ··· ≤ m+n−1−t = L.

If r≥ n − 1 then_m₋₁

Corollary 4.3.10 allows us to provide a relatively easy proof by induction on m that if f1≥ ··· ≥ fm > 0,_m

i=1 fi = 1, and integers 1≤ ··· ≤ m sat-isfying_m

i=1n⁻ⁱ ≤ 1 minimize_m

j=1 fjj, then some instance of Huffman’s algorithm applied to f1,..., fm with respect to a code alphabet A with n letters will produce an encoding scheme with code word lengths1,...,m. In these circumstances, if m≤ n we must have 1= ··· = m= 1 and Huffman’s algo-rithm trivially gives the desired result. So suppose that m= n + k(n − 1) + t for some integers k≥ 0 and t ∈ {1,...,n − 1}; we go by induction on m. Note that although there may well be different instances of Huffman’s algorithm ap-plicable to f1,..., fm, based on different merging choices in the “merge” part of the algorithm, the first merge will invariably merge the t+ 1 source letters sm−t,...,sm into a letterσ, which will be given relative frequencym

j=m−t fj. Let L= m. By the previous observation that 1≤ ··· ≤ m is an n-ary Huffman sequence and Proposition 4.3.9, we have thatm= ··· = m−t = L, and by Corollary 4.3.10, the non-decreasing rearrangement of1,...,m−t−1, L− 1 is an n-ary Huffman sequence. We verify that these code word lengths minimize the average code word length of a possible prefix-condition code for S= {s1,...,sm−t−1,σ} → A with respect to the relative frequencies f₁=

f1,..., f_m_−t−1= fm−t−1, f_m_−t =_m

j=m−t fj. Suppose that₁,...,_m_−t are

4.3 Average code word length and Huffman’s algorithm 89

positive integers such that_m_−t

j=1n⁻^j ≤ 1 and

j=1n⁻^j ≤ 1, showing that there is a prefix-condition encoding scheme for S → A with code word lengths ₁,...,_m_−t−1,m−t+ 1,...,m−t+ 1. By the induction hypothesis, there is an instance of Huffman’s algorithm resulting in a prefix-condition scheme for S → A with code word lengths

1,...,m−t−1, L − 1 for s1,...,sm−t−1,σ, respectively. Let u denote the word of length L− 1 assigned to σ in this encoding scheme. Then ua1,...,uat+1

will be the words of length L= m−t= ··· = massigned to sm−t,...,smin the scheme obtained by the instance of Huffman’s algorithm consisting of preced-ing that for S→ A by merging sm−t,...,sm. Thus some instance of Huffman’s algorithm results in an encoding scheme for S→ A with code word lengths

1,...,m.

It remains to show that every instance of Huffman’s algorithm produces an optimal encoding scheme, with respect to the given source frequencies. In view of what has already been shown, this task amounts to showing that different instances of Huffman’s algorithm applied to relative source frequencies f1≥

··· ≥ fm result in schemes with the same average code word length. We leave the details of this demonstration to the reader. Go by induction on m, and use the observation that if m= n + k(n − 1) + t, k ≥ 0, 1 ≤ t ≤ n − 1, then every instance of Huffman’s algorithm applied to f1≥ ··· ≥ fm, up to switching the order of source letters with equal source frequencies, starts with the merging of sm−t,...,sm into a new letter with relative frequency_m that Proposition 4.3.9 can be used for part of the proof, and provides the funny corollary that if the two inequalities above hold, then the leftmost is equality.)

In document Introduction to Information Theory and Data Compression (Page 94-98)