UC Riverside
UC Riverside Previously Published Works
Title
Prefix codes: Equiprobable words, unequal letter costs
Permalink
https://escholarship.org/uc/item/7jv1g23j
Journal
SIAM Journal on Computing, 25(6)
ISSN 0097-5397
Authors Golin, MJ Young, N
Publication Date 1996
DOI
10.1137/S0097539794268388 Peer reviewed
eScholarship.org Powered by the California Digital Library University of California
provided by eScholarship - University of California
PREFIX CODES: EQUIPROBABLE WORDS, UNEQUAL LETTER
COSTS*
MORDECAI J. GOLIN AND NEAL YOUNG
Abstract. We consider the following variant of Huffman coding in which the costs of the letters,ratherthan the probabilitiesofthewords,arenonuniform: "Givenanalphabetofrlettersof nonuniformlength, findaminimum-average.length prefix-free set ofncodewordsoverthe alphabet";
equivalently, "Find an optimal r-ary search tree with n leaves, where each leaf is accessed with equal probability but the cost to descend froma parent to its ith childdepends on i." Weshow newstructuralproperties of such codes, leadingtoanO(nlog2r)-timealgorithmfor finding them.
Thisnew algorithmissimplerand fasterthan thebest previously knownO(nr min{logn,r})-time algorithm, due toPerl, Garey, andEven[J, Assoc. Comput. Mach,, 22 (1975),pp.202--214],
Keywords, algorithms, Huffmancodes, prefixcodes, trees AMS subject classification. 68Q25
1. Introduction. Thewell-knownHuffman codingproblem
[3]
isthe following:givenasequence ofaccessprobabilities
(pl,
p2,...,Pnl,
construct a binaryprefixcode(wl,
w2,...,Wn)
minimizing the expected lengthiPi’ length(wi). A
binary prefix codeis aset ofbinarystrings, none of whichis aprefixofanother.A
natural generalization of the problem is to allow the wordsof the code to be strings over an arbitrary alphabet of r_>
2 letters and to allow each letter to have an arbitrary nonnegative length. The length ofa codeword is then the sum of the lengthsofitsletters.For
instance,the"dots
and dashes" ofMorse
codeare avariable- lengthalphabetwithlengthcorresponding to transmissiontime,(See
Figure1.)
Thisgeneralization ofHuffman coding to avariable-length alphabet has been considered by many authors, includingAltenkamp and Mehlhorn
[1]
andKarp [5].
Apparently, no polynomial-time algorithmfor it isknown, nor is it knownto beNP-hard.
A
prefix codeinwhichthecodewords(wl,
w2,...,w)
are inalphabeticalorderis called alphabetic[1], In
thiscase, theunderlying treerepresentsan r-ary search tree.Thelength of theithlettercorresponds to thetimerequired to descend
from
anode into its ith subtree. This time is often a function of in search-tree algorithms, for instance,when the subtreeto descend into ischosen bysequential search.An
optimal alphabeticcode thus corresponds to a minimum-expected-cost search tree.In
this paper, we consider the special case in which the codewords occur with equal probability, i.e., .eachp equals1In.
With this restriction, the alphabetic and nonalphabetic problems are equivalent. The problem may be viewed as a variant of Huffman coding in which the lengths of the letters, rather than the codeword probabilities, are nonuniform. Alternatively, it may viewed asthe problemof finding an optimal r-ary search tree, where the search queries areuniformly distributed but the time todescend from a parent to its ith child dependson i.For
the complexity results statedinthis paper,the algorithmsreturnatreerepresenting anoptimalcode.Receivedby the editorsMay25, 1994; accepted for publication(inrevisedform)March 1, 1995.
Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong ([email protected]). Theresearch of this author was partially supportedby HK RGC Competitive ResearchgrantHKUST181/93E.
UMIACS,University of Maryland, CollegePark, MD20742. Current address: Department of ComputerScience,DartmouthCollege,Hanover,NH0375573510([email protected]). The research of this authorwaspartially supported byNSFgrantsCCR-8906949 andCCR-9111348.
1281
Depth
1 e f
a b c d
FI(. 1. Twotreesfor thesixsymbols a, b, c, d, e, andf, each occurring withprobability1/6.
Thetreeontheleftisthe optimal tree thatusesthealphabet{0,1},withlength(O) length(I) 1, while the tree on the right isfor the alphabet {.,_} with length(.) 1 and length(_) 2. The
corresponding sets ofcodewordsare
a=000, b=001, c=011, d=011, e=10, f=11
and
a= b= c
....
d= e= f....
In 1989,
Kapoor
and Reingold[4]
describedasimpleO(n)-time
algorithm forthe binarycaser 2.In
1975, Perl,Garey,
andEven[7]
gave anO(rn min{r,
logn})-time
algorithm
(though
due to a typographicalerror, their abstract incorrectly claimsanO(rn)-time algorithm).
Inthe same year, Cot[2]
describedanO(r2n)-time
algorithm.In 1971, Varn
[8]
gave an algorithm without analyzing its complexity. It appearsVarn’s
algorithm requirest2(rn)
time.In this paper, we describe an
O(n
log2r)-time
algorithm based on new insights into the structure of optimal trees. In2,
we define shallow and proper trees and prove that some proper shallow tree is optimal.In 3,
we develop the algorithm, which efficiently constructs all proper shallowtrees and returns one representing an optimal prefix code.2. Shallowtrees. Fix aninstanceof theproblem,givenbythe respectivelengths
{cl <_
c2< <_ cr)
of therlettersin thealphabet and the numbernof(equiprobable and prefix-free) codewords required.We
assume the standard tree representation of prefix codes, as described inthe following definition.DEFINITION 2.1. The infinite r-arytree is the infinite, rooted, r-ary tree. Each tree edge has alength andalabel--an edgegoing
from
anodetoitsi th child has lengthci and is labeled with the th letterin the alphabet.
A
node is a nodeof
theinfinite
r-ary tree. The finite words overthe alphabet ofr letterscorrespond to the nodes. The labels alongthepath from the root to any node spell the corresponding word and the length of the path is the length of this word.A
prefix code corresponds to a set of nodes none ofwhich is a descendant of another.(See
Figure1.)
DEFINITION 2.2.
A
tree is any subtree Tof
theinfinite
r-ary tree containingthe root. In any tree, n
of
the leaves will beidentified
asterminals; theircorrespond- ing wordsform
a prefix code. The remaining nodes in the tree arereferred
to asnonterminals.
Given a nodeu, thenotation
childi(u)
denotesu’s
ith child;depth(u)
denotes the depth(the
lengthof
the correspondingcodeword); parent(u)
denotes the parent.Thecost
c(T) of
such atreeis thesumof
the depthsof
theterminals--also called the externalweighted pathlengthof
the tree.A
proper tree is a tree in which every nonterminal has at least two children.Thegoal is tofind an optimaltree with n terminals. It is easyto seethat some optimaltree is proper; thus werestrict our attention toproper trees.
Our basic tool for understanding the structure of optimal trees is a swapping argument. For example, in any proper optimal tree, no nonterminal is deeper than any terminal. Otherwise, the terminal and the subtree rooted at the nonterminM couldbe swapped, decreasingthe averagedepth ofthe terminals.
We
use aswapping argumentto prove thatanoptimal proper tree has thefollow- ingform forsome m. Thenonterminals are the rnshallowest(i.e., least-depth)
nodes of the infinitetree, whilethe terminalsarethenshallowestavailable childrenofthese nodesin the infinite tree. Wecallsuch atree shallow; here isthe precise definition.DEFINITION 2.3.
A
treeT isshallow provided that(i) for
any nonterminal u E T and any node w(not
necessarily inT)
that isnot a nonterminal,
depth(u) <_ depth(w)
and(ii) for
any terminalu T and any node w thatis not in T but is a childof
anonterminal,
depth(u)<_ depth(w).
Notethat anonterminal ofan
(improper)
shallow treemight have nochildren in thetree. Thisis whywereferto "terminal" and "nonterminal" nodesin placeofthe morecommon "internalnodes" and "leaves."As
a simple example, consider the basic binary tree; r 2, cl c2 1.A
proper binary tree T will be shallow if and only if there is some depth such that
(a)
every nodeu in the infinite tree withdepth(u) <
is anonterminM inT
and(b)
all terminals of
T
are on levels and+
1. Conditions(a)
and(b)
are necessary andsufficient conditions for T to have minimum external path length among all binary trees with the same number of leaves; see, e.g.,
[6, 5.3.1]. So,
a binary tree has minimum external path length for its number of leaves if and only if it is shallow.For example, the binary treeon theleft-handsideof Figure 1 has minimumexternal path length among all trees with six leavesbecause it fulfills conditions
(a)
and(b)
with 2.
As
we will see later, though, for most values of r and c, shallowness alone does not imply optimality.However,
ifa shallowtree has the right number of nonterminals, then it is optimal.LEMMA
2.4. Let m* be the minimum numberof
nonterminals in any optimal tree. Then anyshallow tree withm* nonterminals is optimal andproper.Proof.
Fixa shallowtreeT withm* nonterminals. Wewillshowtheexistenceof anoptimaltree withthesamenonterminals asT. SinceT
isshallow, byproperty(ii),
this will imply that T is optimal.
By
the choice ofm*, T is also proper(otherwise
there would be an optimalpropertreewith fewer
nonterminals).
It remainsto show the existence ofan optimal treewith the same nonterminMs as T. Let T* be an optimal
(and
thereforeproper)
treewith m* nonterminals. Let N and N* be the sets ofnonterminals ofTandT*,
respectively. IfNN*,
we are done. Otherwise, let ube aminimum-depth nodein N-N*,
sothatu’s
parentis in N*. Letu* beanodeinN*-N. Note that, since Tisshallow,depth(u*) >_ depth(u)
but that, inT*, u* is anonterminal(with
at least two terminaldescendants)
whileuiseither aterminalor not present.
In T*, swapthe subtrees rooted at u and u*. Specifically, makeu a nonterminM
11
I2
FI. 2. The topofalabeledinfinite tree withr 3,cl 2, c2 2, andC3 5.
and, for each descendant v* ofu*, delete it and add the corresponding descendant v ofu. Ifv* wasaterminal, make v aterminal; otherwise, make v anonterminal. Ifu was aterminal, makeu* aterminal; otherwise, delete u*. Call the resulting tree T
.
From
depth(u*) >_ depth(u),
itfollowsthatc(T’) <_ c(T*).
ThusT’
isalso optimal.Note
that T shares one more nonterminal with T than does T*. Thus repeated swappingproduces anoptimal tree withthesame nonterminals asT.Note that m*
_> (n- 1)/(r- 1)
since each node hasdegree at most r.COROLLARY 2.5. Letmmin
[(n--1)/(r--1).
Let(rmmin,rrnmin_t_l,rrnmin_t_2,
be any sequence
of
shallow trees such thatfor
each m,Tm
hasrn nonterminals. Then oneof
theTm
is proper and optimal.The algorithmgeneratesasequenceof shallow treesasaboveand returns theone which has minimumcost. The lemma guarantees that this tree willbe optimal. The restof thepaperisdevotedto examining the properties ofshallowtrees whichenable theenumerationofthepropershallow treesin
O(n
log2r)
time.2.1. Defining the trees.
Ordering the nodes. Label the nodes ofthe infinite tree as 1,2, 3,... in order of increasingdepth. Breaktiesarbitrarily, except that if twonodesuandw are ofequal depth, both are ith children oftheir respective parents, and
parent(u) < parent(w),
then letu
<
w(this
is neededfor Lemma3.2).
Forthesake ofnotation, identifyeach node with its labelsothat 1 istheroot, 2 is a minimum-depth child oftheroot, etc.Figure 2 illustrates the top section ofsuch a labeling for r 3, Cl 2, c2 2, and c3 5. Thesevalues ofr and cj are theones we use inall later examples.
DEFINITION 2.6. For eachm
>_
mmin,define
T, to be the tree whose nontermi-nalsare
{1,..., m}
andwhoseterminals aretheminimum n nodes among the childrenof {1,...,m}
in{m +
l,m+ 2,...}.
Thus
Tm
isthe "shallowest" tree withmnonterminals withrespectto the ordering of the nodes. Since the ordering of the nodes respects depth, eachTm
is shallow.Figure3 presentsT5, T6,
TT,
andTs
for n 10 using thelabeling ofFigure 2.2.2. Relationof successive trees.
Next,
weturnourattention to the relation ofTm+l
toT,.
LEMMA
2.7. Form>_
mmin, the new nonterminal(node
rn+ 1)
inTm+l
is theminimum terminal
of Tin.
Proof.
The parent ofrn+
1 is in{1,..., m},
so m+
1 isthe minimum child of{1,..., m}
in{m +
1,m+ 2,...}.
Theresult followsfrom the definitionofT,.LEMMA
2.8. Form>_
mmin, provided the newnonterminal(node
rn+ 1)
inhas at least one child, each terminal
of Tn+l
is either a childof
m+
1 ora terminalof T..
Proof.
Let node m+
1 have d children inT,+I.
Let C denote the set of children of nodes{1,... ,m}
in{m +
1,m+ 2,...}.
The terminals of treeT,+I
consist of the minimum d children of node m/ 1 together with the minimum n d nodes in C{m + 1}.
Thesen d nodestogether withnodem+
1(the
minimumnode inare the n d
+
1 minimum nodes in C. Ifd>_
1, then by the definition ofT,,
each such node is a terminalinTm.
The main significance ofLemmas 2.7and 2.8 is that they will allow an efficient constructionof
Tn+l. Moreover,
they implythat ifT,
isnot proper, then neitheris anysubsequent tree.LEMMA 2.9. One
of
the trees{Tmmin,Tmmin+l,...,Tmmax
is optimal and proper, wheremmax min{m T,+I
is improper}.Proof. By
Lemma 2.8, ifT,
is improper, thensois T,+l--eithernodem+
1 hasno children in
T,+I
or the nonterminal inT,
that had less than two children also has less thantwo children inT,+I. Hence,
for each m>
mmax, treeT, is improper.ThusCorollary2.5 implies that oneofthe trees
(Trnmi.,
Trnmin-}-l,.Trnmax
is properand optimal. [l
For n 10 mmin -5:-i-10--1-] 5 and
(as
shown in Figure3) Ts
is improper.The lemmathen implies thatone of
T5,
T6,orT7
must haveminimum externalpathlength. Calculation shows that
T
withc(T6)
59 isthe optimalone.3. Computing the trees. The algorithm uses thefollowing two operations to computethetrees.
ToSPROUT&treeisto makeits minimumterminalanonterminalandto add the minimum childofthis nonterminal as a terminal.
ToLEVEL atree is toaddc childrenofthemaximumnonterminal to the tree as terminals and to remove the c largest terminals in the tree. The c children are the
Depth
32456
1 2 3
u[i]
3 3 1w[i]
5 5 41 2 3 4 3 1 6 6 3
Depth
Depth1
69
1 2 3
4 4 1
u[i]
7 7
e
1 2 3 4 4 2 8 7 2
FIG. 3. The trees T5, T6, TT, andTs forr 3, cl 2, c2 2, C3 5, andn 10. The
node numbering is that ofthe previousfigure. Calculating the external path lengths, wefindthat c(T5) 60, c(T6) 59, c(TT) 60, andc(Ts) 62.
minimum c children not yet in thetree, where c is maximum such that all children added are less thanall terminalsdeleted.
The algorithm computes theinitialtree
Tmmin
and then repeatedlySPROUTS
andLEVELS to obtain successive trees until the tree so obtained is not proper. Lemmas 2.7and 2.8 implythat, aslongasnodem
+
1 has at leastone child inTm+l (it
will ifTm+l
isproper),
SPROUTing and LEVELingTm
yieldsTm+l.
Figure 4illustrates this operation.OBSERVATION
3.1.Let
mrnmax. If
node m+
1 has at least one child inTm.+l
then SPROUTing and LEVELing
Tm
yields treeTm+. If
node m+
1 has no childrenin
Tm+,
then the maximum terminal inTm
is less than the minimum childof
nodem
+
1 and SPROUTing and LEVELingTm
yields a tree in which nonterminal m+
1has one child.
Hence
the algorithm always correctlyidentifies Ttmax
and terminatesDepth
32456
Depth
Depth
T6 LEVEL(SPROUT(T5))
FIG. 4. SPROUTing and LEVELingT5 yieldsT6.
correctly, having consideredallrelevant trees.
To SPROUTrequires identification andconversionof theminimumterminalofthe current tree, whereas to LEVEL requires identification and replacement of
(no
morethan
r)
maximum terminals by children ofthe new nonterminal. One could identify the maximum and minimum terminals inO(logn)
time by storing all terminals in two standard priority queues(one
to detect the minimum, the other to detect themaximum). At
most r terminals would be replaced in computing each tree and, becausemmax <_
n- 1, onlyO(n)
treeswould be computed. This approach yieldsanO(rn
logn)-time
algorithm.By
a morecareful use ofthestructure of the trees, weimprove this in two ways.First, we give an amortized analysis showing that in total, only
O(n
logr),
ratherthan
O(rn),
terminals are replaced. Second, we show how to reduce the number of nonterminals in each priority queue to at most r. This yields anO(n
log2r)-time
algorithm.
Both improvementsfollow from thetie-breaking conditiononthe ordering ofthe nodes, whichguarantees that
T,
must have the following structure.LEMMA
3.2.In
anyTin,if
u and w are nonterminals withu<
w and the ith childof
w is in the tree, then so isthe th childof
u.If
the th childof
w is anonterminal, then so is the th childof
u.Proof.
The proofis straightforward from the definition ofTm
and the condition on breakingties in ordering the nodes(in 2.1).
COROLLARY
3.3. Nodem has a minimum numberof
children among all nonter- minals inTin.
3.1. Only
O(n
logr)
replacements total. The numberofterminals replaced while obtainingTm
fromTin-1
is at most the number ofchildren of nonterminal m inT,.
Although this might be r for manym, the sumof the numbers ofchildren isO(nlogr).
LEMMA
3.4.Let dm
be the numberof
childrenof
nonterminal m in treeTm.
Then
r dm
isO(n
logr).
Proof. By
Corollary 3.3, within Tin, node m has the fewest children. The total number ofchildrenofthemnonterminalsis m+
n-1. Thusdm
is atmost theaverage(rn +
n-1)/m
1+ (n- 1)(1/m).
mmx
n--1
n
1.Theresult follows frommmin
r--’L--fl
andmma
x3.2. Limiting the relevant terminals.
To
reduce the number of terminals thatmustbe considered infindingthe-minimumandmaximumterminals,we partition the terminals into r groups. The ith group consists of the terminals that are ith children(i
1,...,r).
LEMMA
3.5. If any Tin,for
any i, the setof
nonterminals whose ith children are terminals isof
theform {u, u +
1,...,w} for
someu
andw.
The minimumamong terminals thatareith childrenis
child(u) (the
ith childofu).
The maximumamong these terminals is
child(w).
Proof.
Thisis a straightforward consequence ofLemma
3.2. E]Figure 3 presentsui and wi for the trees
T5,
T6,TT,
andTs
when n 10.This lemma implies that the minimum terminal in
Tm
is the minimum among{child(u) 1,...,r}. Our
algorithm finds the minimum terminal inT
by maintaining these r particular children(rather
than all nterminals)
in a priority queue. This reduces thecost of finding the minimumfromO(log n)
toO(log r).
Sim-ilarly, the algorithm finds the maximum terminal in
O(logr)
time by maintaining{child(w)
:i 1,...,r}
in an additional priority queue.OBSERVATION
3.6.1 As
an aside, one can prove usingLemma
3.5 that,for
anym such that mmin
<
m<
mmax,c(Tm+l)- c(Tm) >_ c(Tm)- c(Tm_l)..
That is, the sequenceof
tree costs is unimodal. To prove this, consider buildingTm+l from Tin.
SPROUTingincreases thecost bycl; LEVELing decreases the costwith eachswap.For each swap in building
Tm+l from
Tin, one can show there was a corresponding swap in buildingTm from Tin-1
and that the decrease in cost(from Tm
toTin+!)
due to the
former
is bounded by the decrease in cost(from Tin-1
toTin)
due to the latter. Thus, in practice, the algorithm could bemodified
to stop and returnTr-I
when
c(Tm) >_ c(T,_I).
This observationisdue toR.Fleischer.
3.3.
The
algorithmindetail. The full algorithm hastwo distinct phases. The first phase constructs the base treeTmmin.
The second phase starts withTmmin
and,by SPROUTing and LEVELing, iteratively constructs thesequence of shallowtrees
(Tmmin,Tmin-l,Tmmin22,’’", Tmax
andreturnsonewhichhassmallest externalpathlength.
Tmmx
is the lastpropertree in the sequence, i.e.,Tmmx+1
is improper.Lemma
2.9 guarantees that the algorithm returns an optimal tree.We
now describe how to implement the first part of the algorithm inO(n
logr)
time and the second inO(n
log2r)
time; the full algorithm thereforeruns inO(n
log2r)
time.Theskeletonof the final algorithmisshowninFigure 5. Procedure
CREATE-Tmmin
creates tree
Tmmi,,
the variableC
contains the external path length of current tree Tin,and mDeg
contains the number ofchildrenof node mintreeTin. As
presented, the algorithm computes only the cost of an optimal tree. It can easily be modified to compute the actualtree. Note that to check that the current treeTm
is proper,by Observation 3.1 and Corollary3.3, it suffices tocheck that nonterminal m has at least twochildren.
COMPUTE-TREES(<Cl,
C2,...Cr> )
I.
CREATE-Tmmin
2. WHILE (mDeg :> 2) DO
Compute Tin+ from Tm--
3.
SPROUT(T)
4.
LEVEL(T)
5.
Cmin - min{C, Cmin }
6.
RETURN Cmin
FIG. 5. Algorithmtofind anoptimalvariable-length prefixcode.
Theroutines
SPROUT
andLEVEL
are shown inFigure 6.Recall that the nodes
of
the infinite tree are labeled in order ofincreasing depth with ties broken arbitrarily except for the requirement that ifu and v are both of equal depth and both are ith children oftheir respective parents, then u<
v if and onlyifparent(u) < parent(v).
Dependingupon c,c2,...,cr, theremay be many such labelings. The algorithm we present breaks ties lexicographically--suppose u and v have the samedepth andletuchild(u’)
andvchildj(v’);
thenu<
vifandonly ifu’ < v’ (or
u v and<
j). Figure 2 illustrates this labeling for r 3, cl 2,c2 2, andc3 5. The sequence of shallow treesisfully determined bythislabelling.
Figure 3illustrates theshallowtreeswith 10 nonterminals for theser andcvalues.
The algorithm represents the current tree
Tm
withthefollowing data structures:N
isthenumber ofterminals.misthenumber ofnonterminals.
It
isalso the rank of themaximum nonterminal.C
isthe sum ofthedepths of the terminals.mDeg
isthenumber ofchildren of nonterminalm.D[u]
isthedepth ofeachnonterminal u.u[i]
isthe rank oftheminimum nonterminal(if any)
whoseith childis aterminal(
<_i<_r).
w[i]
isthe rank of themaximumnonterminal(if any)
whoseith childis aterminal(1 _ i<_ r).
Ifno nonterminal has aterminal ith child, thenu[i] > w[i].
Spaou(T)
--Make the minimum terminal a nonterrninat-- 1. na+--m
+
1;2. Let
childi(u[i])
be the minimum terminal in low-queue.3.
Dim]
+--D[u[i]] +
ci;u[i] u[i] +
1;UPDITE-QS(T, i)
4. C +-C-
Dim]; mDeg
+- 0;--Add smallest child as a terminal-- 5.
ADD-TEPMINAL(T)
LEVEL(T)
1. WHILE
(mDeg <
r andchildmDeg+l(m)
is less thanthe max. terminal
childi (w [i]
in high-queue) DO2.
ADD-TERMINAL(T)
--Delete the maximum terminal--
3. C C-
(D[w[i]] + ci)
4.
w[i] - w[i]-
1;UPDATE-Qs(T, i) ADD-TERMINAL(T)
1.
mDeg
+-mDeg +
1; C C+ Dim] +
CmDeg;2.
w[mDeg] -
m;UPDATE-Qs(T, mDeg)
FIG. 6. OperationsSPROUT andLEVEL.
low-queue is a priority queue for finding the minimum terminal. It contains
{childi(u[i]) u[i] < w[i]}.
high-queue is a priority queue for finding the maximum terminal. It contains
{childi(w[i]) u[i] < w[i]}.
For anexample, referback to Figure 3. Tree
Ta
hasN 10, C 59,
mDeg
2,D[1]
0,D[2]
2,D[3]
3,D[4]
4,D[5]
4,D[6]
4,u[1]
4,u[2]
3,u[3]
1,will
6,w[2]
6,w[3]
3,low-queue
{child1 (4), child2(3), child3(1)},
high-queue
{child (6), child,.(6), child3(3)}.
The priority queues are maintained as follows. Ingeneral, a terminal in
T,
can have rank(label)
arbitrarily larger than m. The algorithm explicitly maintains the ranks and depths ofthem nonterminalsinthecurrent tree; the algorithm compares the ranks ofterminals in the priority queuesviathe ranks and depths oftheir(non- terminal)
parents. Whenu[i]
orw[i]
changestoreflecta newcurrent tree, the queues are updated bythefollowing routine:UPDATE-Qs(T, i)
1. IF
(u[i] <_ w[i])THEN
2. Update
childi(u[i])
in low-queueandchildi(w[i])
inhigh-queueto maintainthequeues’ invariants.
3. ELSEDelete both nodes from their respective queues.
Line 2 replaces the old
childi(u[i])
in low-queue(childi(w[i])
in high-queue)by thenew onewhen
u[i] (w[i])
changes. Line3 willonlybeexecutedifchildi(u[i]) >
childi(w[i]),
which will only happen if the tree no longer contains any ith child as a terminal. Note that Lemmas 2.8 and 3.2 imply that if, for some and Tin, no nonterminM has an ith child inT,,
then no nonterminal has an ithchild inT,+I.
Construction
of
thefirst
tree. TreeTmmin
hasasimple-structure. Itsnonterminals are the nodes(1,2,..., mmin/.
Its terminals are the n shallowest children of nodes(1,2,..., ?Ttmin).
TO construct Tmi, we assume that n
>
r; otherwise,Tmi
is simply the rootandits firstnchildren. For 1 m
<
mmin,defineT
tobethetree with nonterminMs{1,..., m}
and allof the(r 1)m +
1 children of{1,..., m}
asterminals. TheproofofLemma2.7generMizes easily to thesetrees; node m
+
1 is the minimum terminM ofT.
CREATE-Tmmin (T)
Create
T1--
1. ?Ttmin
[n--l___] D[1] -
0; C--E min{r’n}i__l
2.
CREATE
low-queue;CREATE
high-queue;3. FOR 1 to
rain{r, n}
DO6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
--Create
(T2,
T3,...,Tmmi }-
FOR m 2 to
(retain 1) DO
Let
child(u[i])
betheminimumterminal in low-queue.Dim] - D[u[i]] +
c;u[i] - u[i] +
1;UPDATE-Qs(T, i);
FORj 1 to r DO
w[j] -
m; UPDATE-QS(T,j);C
-
CDim] + Ej=I (Dim] + cj);
--Create
Tmmin
m ?7min; / T-
(r 1)(mmin 1);
Let
childi(u[i])
bethe minimum terminal in low-queue.Dim]
+--D[u[i]] +
ci;u[i]
+--u[i] +
1;UPDATE-QS(T, i)
FORj=lto A DO
w[j]
+--m;UPDATE-QS(T,
j);C C
Dim] + (Dim] + cj);
mDeg A;
LEVEL(T);
FIG. 7. OperationCREATE-Tnmi
The tree
T1
is easy to construct. It is the tree with 1 root and r children. In- ductively construct thetreeT,
from the treeT,-I,
rn<
mmin, aSfollows: find the minimum terminal inTn
by taking theminimum terminalout of low-queue. Label this node rn, make it a nonterminal, and add all ofits children toT,
as terminals.The details are shown in Figure 7.
Finally, construct
T?Ttmin
fromTmmin__ by makingthe lowest terminalofT,min_l into node mmin. Add the n-(r- 1)(mmin 1)
minimum children ofnode mmin aSterminals, bringing the total number of terminals in thecurrent tree to n. Level the resultingtree.
Since only
O(n/r)
trees are constructed while computingTmin
and each treecan be constructed from the previous tree in
O(r
logr)
time, the time required tocompute
T?Ttrni
isO(n
logr). (If
desired, thetimefor eachtreeTm
with rn<
mmincanbe reducedto
O(log r)
because maximum terminals are not replaced in constructing such atree.)
Construction
of
the remaining trees. The algorithm constructs the sequence of trees(TTl’min,Tmin+[,T%min+2,’’’, T’tmax
as described previously. Tree
T,
is found by SPROUTing and then LEVELing its predecessorTin-1.
The cost isO(dm
logr)
time, wheredm
isthe number ofchildren of the new nonterminM rn inT,. By Lemma
3.4, this part ofthe algorithm runs in 0((Era din)
logr) O(n
log2r)
time.Acknowledgments. The authors would liketothankDr. JacobEccofor intro- ducingustotheMorsecodepuzzle,whichsparkedthis investigation. They would also like to thank Siu Ngan Choi and RudolfFleischer
(who
made Observation 3.6the unimodality ofthe treecosts)
for their careful reading ofan earlier manuscript and subsequent comments.REFERENCES
[1] D. ALTENKAMP AND K. MELHORN, Codes: Unequalprobabilies, unequal letter costs, J. Assoc.
Comput. Mach., 27(1980), pp. 412-427.
[2] N. COT, Complexityofthe variable-lengthencodingproblem, inProc.6thSoutheastConference onCombinatorics, Graph Theory, and Computing, CongressusNumerantiumXIV, Utilitas MathematicPublishing, Winnepeg,MB, Canada, 1975,pp. 211-244.
[3] D. A. HUFFMAN, A methodforthe constructionofminimum redundancy codes,Proc. IRE, 40 (1952),pp. 1098-1101.
[4] S.KAPOORAND E. M. REINGOLD, Optimum lopsided binarytrees, J. Assoc. Comput. Mach.,36
(1989), pp. 573-590.
[5] R. KARP, Minimum-redundancy codingfor the discrete noiseless channel, IRE Trans. Inform.
Theory, IT-7 (1961),pp. 27-39.
[6] D. E. KNUTH, TheArtofComputerProgramming, VolumeIII:SortingandSearching,Addison- Wesley, Reading,MA, 1973.
[7] Y.PEPL, M. R. GAREY, ANDS. EVEN,Efficientgenerationofoptimalprefixcode: Equiprobable words usingunequal costletters,J. Assoc. Comput.Mach.,22 (1975), pp. 202-214.
[8] B. VAtN, Optimalvariable lengthcodes (arbitrarysymbolcost andequalcodewordprobability), Inform. Control, 19 (1971), pp. 289-301.