Quantizers for Data Compression and
Pattern Matching using Sorted Codebooks
John Leis
http://www.usq.edu.au/users/leis/
March2,2004
FacultyofEngineering&SurveyingTechnicalReports
ISSN 1446-1846
ReportTR-2004-01
ISBN 1877078077
FacultyofEngineering&Surveying
UniversityofSouthernQueensland
ToowoombaQld4350Australia
http://www.usq.edu.au/
Purpose
The Faculty of Engineering and Surveying Technical Reports serve as amechanism for disseminating results from
certainofits research anddevelopment activities. The scope of reports inthis seriesincludes (butisnot restricted
to): literature reviews, designs, analyses,scientic andtechnicalndings,commissionedresearch,and descriptions
ofsoftwareandhardwareproducedbystaoftheFaculty ofEngineering andSurveying.
LimitationsofUse
TheCounciloftheUniversityofSouthernQueensland,itsFacultyofEngineeringandSurveying,andthestaofthe
UniversityofSouthernQueensland: (1)donotmakeanywarrantyorrepresentation, expressorimplied,withrespect
totheaccuracy,completeness,orusefulnessoftheinformationcontainedinthesereports; or(2)donotassumeany
Abstract
Vectorquantization(VQ)hasfoundapplicationinverylow-ratespeechandimageencoding,together
with content-basedmultimediaretrieval. In order to reduce the average distortion, large codebooks
are required to store suÆcient representative source vectors. The fundamental drawback with large
codebooks is the search time required at the encoder. In order to reduce this search time, several
methods havebeen proposed in the literature. This note describesan improvement onthe so-called
triangle-inequality basedsearch. Experimentalresults are presented to demonstratethe usefulnessof
the method,whichis able to reduce search times to as little as 10-20% of the full-search equivalent
methodundercertaincircumstances.
1 Background
Vectorquantization(VQ) isatechniqueformatchingavector(ormatrix)ofdata tothenearestmatch in
aprecomputedcodebookofrepresentativedata[1]. Itndsapplicationinlow-rateencodingforspeechand
images,andmaybeusedinpatternrecognitionapplicationsforsearchingatemplatedatabaseforthenearest
match toagiven featurevector. Oneoftheproblems withVQisitsveryhighcomputationalcomplexity.
Since the trainingphase is usually done separately, the complexity problem is not of particular concern
when designing the codebook. However, when encoding vectors, the encoding time canbe a signicant
problem,especiallyforreal-timeencoders.
SeveralcomputationallyeÆcientVQsearchalgorithmshavebeenproposedinthepast. Thecomputational
reductionofthese is somewhatdependentonthe natureof thesourcedistribution,and usuallyinvolvesa
tradeobetweensearchtimeandmemoryrequirements.
Due tothenature of thesearchmethod, severalsimplicationsmaybemadeto thesearch process. More
advancedmethods,whichareabletogiveamoresignicantreductioninthesearchtime,utilizeprecomputed
parametertables.
2 Vector Quantization
Inavectorquantizer, acodebookC ofsizeNkmapsthek-dimensionalspaceR k
ontothereproduction
vectors(alsocalled codevectors orcodewords):
Q:R k
!C : C ,(c
1 ;c
2 ;;c
N ;) T ; c i 2R k (1)
Thecodebookconsistsofthecodevectorsc
i
: i=0;:::;N 1. Thecodebookvectorsareselectedthrough
aclusteringortrainingprocess,involvingrepresentativesourcetrainingdata.
Encoding orpatternmatching for ablockof k samplesin vectorxconsists ofsearchingthecodebookfor
an entry c
i
that is theclosest approximationfor the current inputvectorx. The encoder minimizes the
distortiond() togivetheoptimalestimatedvectorx :^
^
x = argmin
c i 2C d(x;c i ) (2)
Theindexithusderivedconstitutesthebest(minimumerror)VQrepresentationy=c
i
oftheinputvector
x. TheusualchoiceforthedistanceminimizationistheEuclideandistancemetric:
d
2
(x;y ) = kx
i y k 2 = K 1 X j=0 (x j y j ) 2 (3)
where x is the source vector, y = c
i
is each codevectorin turn, and C = fc
i g
N 1
i=0
3 Search Methods
Adirect calculationofthedistancemetricforeachcodevectorinthecodebookforagivensourcevectoris
generallytermedanexhaustivesearch. AnumberofsimplicationsmaybemadetotheVQsearchalgorithm
toreduce thecomplexity. Thesefallinto thecategoriesof:
Partial-distance searchtechniques. Thisapproachaimstoeliminatecertaincodevectorsfromthesearch
before the full distortion is calculated. The computational reduction attainabledepends upon how
earlythecodevectormaybeeliminatedasacandidatein thecurrentsearch.
Algebraicsimplications. This approachusesamathematicalsimplicationofthedistortionmetricto
both precalculatecertain constantsand apply preliminary tests to removecertain codevectorsfrom
candidature.
RegionofInterest (ROI) partitioning. Byconsidering the search process in R k
, aregion of interest
maybeusedtoprunethesearchtoamuchsmallernumberofadmissablecandidatecodevectors. This
method isthemostpowerfulofall,butrequiresaverylargememoryspace.
Theexactreductionattainableis somewhatdependentuponthesourcecharacteristics. Thepartial-search
and algebraic techniques require at least one scalar comparison to be performed on each vector of the
codebook,andhencetheupperbound onthereductionincomplexityislimitedbythecodebooksize. The
ROImethods eliminate mostcodevectorsin thecodebook withoutthe needfor avectorcomparison, and
thusformanextremelypowerfulapproach,withthepotentialforsubstantialperformanceimprovements.
3.1 Partial Distance Search
ForKcomponents,theEuclideanmetricrequirestheaccumulationofKpartialdistortionvalues,aftereach
successivetermisaddedintothetotaldistortion. Theideaofthepartialdistancesearchistoterminatethe
searchiftheaccumulatedpartial distortionexceedstheminimumdistortionsofarin thecodebook search.
This approach requires anadditionaltest within theloop, andmayormaynotresultin anynetsavings.
Processorswhichheavilyutilizecachingandbranchpredictionarelesslikelytobeabletoeectivelyutilize
apartialdistancesearch,sincetheoverheadofbranchtestingandpipelinemissesmustbetradedoagainst
feweroating-pointcomputations.
Representativework in this areaincludes [2], [3], [4], and [5]. Note that the title of [3] implies asimilar
techniqueto thatpresentedhere,whereasitin factpresentsamethod fororderingthecomponentswithin
eachcodevector,ratherthanorderingthecodebookitself.
3.2 Algebraic Simplication
Severalalgebraicsimplicationmethodshavebeenreportedintheliterature,suchas[6],[7],[8],[9],and[10].
Themethodof[7] hasbeenused intheexperimentalresultspresentedhereforthepurposeofcomparison.
LetxbetheinputvectorofdimensionK,x
k
becomponentkofx,andx themeanofvectorx. Dening
d 0
( x;y
i
) = K(x y
i )
2
(4)
itmaybeshownthat
d(x;y) d 0
Thisdependsonlyonprecomputedvalues,andiseasily calculatedforeachcodevector. Thefulldistortion
calculationisonlyperformedforcodevectorswhichsatisfy
d 0
(x;y
i
) < d
min ( x;y
i
) (6)
whered
min ( x;y
i
)istheminimumdistortion foundupuntilthat pointinthesearch.
3.3 Region-of-Interest Search
Thesemethods utilizeprecomputedparametertables,and henceincreasethememoryrequirements.
How-ever,substantialreductionsin computationalrequirementsmaybeachieved. Methods which include
par-titioningthevectorspace include thosein [10], [11],[12],[13]and[14], although ourwork isbasedon the
distanceandtriangle-inequalitymethods of[15].
Givenaninputvectorxandastartingpointc
i
,Figure1showsthat thebestmatchinthecodebookmust
satisfy r x h i r k r x +h i (7) where h i
is the distance from x to c
i
. This may be visualized in two dimensions as a region bounding
kxkh
i .
The values of r
i
are stored alongside each codevector. From any current codevector c
i
, the value h
i is
calculated. Only codevectors which fall inside the bounding areagivenby r
x h
i
need be tested. This
requirestwoscalarcomparisonstodetermineifthecodevectorisacandidate. Ifthecodevectorisapossible
candidate,afulldistancesearchisperformed.
Theinitialcodevectorc
i
maybechosenat random,orasthecodevectorwhoser
i
isclosesttor
x
. Ateach
stage, if a new codevector with lower distance is found, the current codevectorc
i
is replaced with that
codevector.
b
b
b
x c ib
hi r xb
b
ck c j x1 x 2 r x h i r x + h iFigure1: Searchmethodbasedthe distancefromtheorigin. Givenatargetvectorxandcurrentbest-match
ci,onlyvectorswithintheradiusrxhi(shaded)needbeexamined.
Anotherregion-basedmethod isbasedon thetriangle inequality, which for atest vectorx and codebook
vectorsc
i andc
j
statesthat
Thisis illustratedin Figure2. Thesearch proceedsbycalculatingh
i
forthecurrentcandidatecodevector
c
i
. Only codevectorswhich haved(c
i ;c
k
) 2h
i
are tested further, sinceonly these couldpossiblyyield
lessdistortion. Intheillustration,codevectorc
k
would betested, but codevectorc
j
would notbetested.
Thekeyto the method is aprecomputedtable of allthe distances d(c
i ;c
j
)in the codebook C, requiring
N(N 1)=2 entries. A good choice of the initial starting vector c
i
will minimize the size of the search
hypersphere.
b
b
b
x
c
i
b
b
c
k c
j
b
2hi
r
x
ri
hi
x1 x
2
Figure2: Searchmethodbasedonthe triangle-inequality. Givenatargetvectorx andcurrentbest-matching
vectorci,onlycodevectorswithinaradius2hi ofci (shadeddisk)needbeexamined.
4 Improvements to the Triangle-Inequality Method
The triangle inequality, when applied to fastsearch methods in many dimensions, eectively produces a
shrinking hypersphere, bounded by the distancebetweenthecandidate codevectorand themost recently
found minimum-distance codevector. The aim is to shrink the hypersphere as rapidly as possible, and
therebyexcludethelargestpossiblenumberofcodevectorsin theshortestpossibletime.
Reducingthesearchspaceasrapidlyaspossiblemaybeachievedbyreorderingthedistancetableasfollows.
Foreach codevectori, we sort each entryin the corresponding rowof the distance table D(i;k);k 2 N,
in increasing order of distance from the codevector with index i. The search space for each candidate
codevectoristhenguaranteedto reduceat thefastest possiblerateforagivencodebook,sincetheclosest
codevectorsareexaminedrst. This re-orderingrequires apermutationtable tokeeptrackoftheoriginal
codebook indexes corresponding to each row of the sorted distance table. However, the search time is
substantiallyreduced,aswillbeshown.
Severalfactorsimpactontheexactamountofcomputationalreductionpossible. Theseareprimarily:
1. Thevectordimensionk.
2. Thecodebook sizeN.
3. Thestatisticaldistributionofthecodevectors,andanyintra-vectorandinter-vectorcorrelation.
In order to evaluate the eectiveness of the proposed method, two sets of experiments were conducted.
The resultsfrom the rst experiment, utilizing codebooks assembled from independent, random variates,
areshowninTable1. Firstly,itisclearthatthetriangle-inequalitybasedmethodsoutperformalgebraicand
magnitudesearches,byaconsiderablemargin. Fora1024-elementcodebook,thetrianglemethodwasable
toreducetheaveragenumberofoating-pointcomputationstoabout20%oftheoriginal. Applicationofthe
magnitude-sortingpermutationtothedistancetablewasfurtherabletoreducethenumberofcomputations,
downtoabout10%oftheexhaustivesearch.
Codebook SearchMethod
Size r-search 4-search sorted-4 algebraic
1024 0.90 0.20 0.10 0.99
2048 0.86 0.15 0.09 1.00
Codebook SearchMethod
Size r-search 4-search sorted-4 algebraic
1024 1.06 0.79 0.66 1.05
2048 1.06 0.68 0.57 1.03
Table 1: Results for 8-dimensional(upper)and 16-dimensional(lower) codevectors. The searchmethodsare
radius-search (r-search), triangle inequality (4-search), sorted triangle inequality (sorted-4) and algebraic
simplication based on Poggi's derivation [7 ]. The results are normalized, so that 1.0 corresponds to the
complexityofafull (exhaustive)searchfor acodebookofthesamedimension.
Next,an image encoding problem utilizing VQ wasexamined. Aproduct-code imageVQwasdeveloped,
similarto that reportedelsewhere[1]. Themean/shape/gainimage encoding VQtransmitsthemeanand
gainof an image block as scalars, with theshapeof the image block encoded using avector index. Five
standardimageswereusedto generatethecodebook{\bridge",\camera",\goldhill" and\lena". Testing
wascarriedoutusingthe\bird"image. Theimageswereblockedinto44sub-blocks,themeanremoved,
andthegainfactoredout.
Table2 illustrates the advantageof the proposed method. For alarge codebook (1024 codevectors), the
number of operations requiredfor a triangular-search | approximately58% that of a full-search | was
reducedto48%whenusingthesorted-permutationdistancematrixasproposedhere. Forasmallercodebook
(256codevectors),thenumberofoperationsrequiredforatriangular-search|approximately74%thatof
afull-search|wasreducedto 65%whenusingthesorted-permutationdistancematrixmethod.
SearchMethod
Codebook r-search 4-search sorted-4 algebraic
102416 1.08 0.58 0.48 1.06
25616 1.08 0.74 0.65 1.06
Table2: Resultsformean/shape/gainimageVQ.Theimageshapevectorsare 44,codebook sizes256and
1024. Thesearchmethodsareradius-search(r-search),triangleinequality(4-search),sortedtriangleinequality
(sorted-4),and algebraicsimplication.The resultsarenormalized,sothat 1.0correspondsto 50176opsfor
thelargercodebook,and12544forthesmallercodebook.
5 Conclusions
Thisnotehaspresentedanimprovementtothetriangular-inequalityfastsearchmethodforsearchingvector
codebooks. Some additionalmemoryoverthat requiredfor thedistance tableis needed,in order to store
thesortedpermutationsforeachcodevectordistance. Thissimpleenhancementtothepreviously-published
References
[1] A.GershoandR.M.Gray,VectorQuantizationandSignalCompression,KluwerAcademicPublishers,
1992.
[2] C-D. Bei and R. M. Gray, \An Improvement of the Minimum Distortion Encoding Algorithm for
VectorQuantization", IEEE Transactions on Communications, vol.COM-33, no.10, pp.1132{1133,
Oct.1985.
[3] K.K. PaliwalandV. Ramasubramanian, \Eectof Ordering theCodebook ontheEÆciency of the
PartialDistanceSearchAlgorithmforVectorQuantization",IEEE TransactionsonCommunications,
vol.37,no.5,pp.538{540,May1989.
[4] James McNames, \Rotated Partial Distance Search for Faster Vector Quantization
Encod-ing", IEEE Signal Processing Letters, vol. 7, no. 9, pp. 244{246, Sept. 2000, also available
http://ece.pdx.edu/~mcnames/Publications/RPDS.pdf
[5] S. C. Tai, C. C. Lai, and Y. C. Lin, \TwoFast NearestNeighbour Searching Algorithmsfor Image
VectorQuantization", IEEE Transactions on Communications, vol. 44, no.12, pp.1623{1628,Dec.
1996.
[6] C.-H.LeeandL.-H.Chen, \FastClosestCodewordSearchAlgorithmforVectorQuantization", IEE
Proceedings,vol.141,no.3,pp.143{148,June1994.
[7] G. Poggi, \Fast Algorithm for Full-Search VQ Encoding", Electronics Letters, vol. 29, no. 12, pp.
1141{1142,June1993, alsoavailablehttp://diesun.die.unina.it/GruppoTLC/poggi/R03.pdf
[8] K.-L. Chung, W.-M., and Yan J.-G. Wu, \A Simple Improved Full Search for VectorQuantization
BasedonWinograd'sIdentity", IEEESignalProcessingLetters,vol.7,no.12,pp.342{344,Dec.2000,
alsoavailablehttp://ece.pdx.edu/~mcnames/Publications/RPDS.pdf
[9] L. Torres and J. Huguet, \An Improvement on Codebook Search for Vector Quantization", IEEE
Transactionson Communications,vol.42,no.2/3/4,pp.656{659,Feb.1994.
[10] S.Baek,B.Jeon,andK-M.Sung,\AFastEncodingAlgorithmforVectorQuantization",IEEESignal
Processing Letters, vol.4,no.12,pp.325{327,Dec.1997.
[11] V.Ramasubramanianand K.K.Paliwal, \VoronoiProjection-BasedFastNearest-NeighbourSearch
Algorithms: Box-SearchandMappingTable-BasedSearchTechniques", DigitalSignalProcessing,vol.
7,no.4,pp.260{277,Oct.1997.
[12] J.Mielikainen, \ANovelFull-Search VectorQuantizationAlgorithm Basedon theLawof Cosines",
IEEESignalProcessingLetters,vol.9,no.6,pp.175{176,June2002.
[13] M.R.Soleymaniand S.D. Morgera, \AFastMMSE EncodingTechniqueforVectorQuantization",
IEEETransactions onCommunications, vol.37,no.6,pp.656{659,June1989.
[14] M.R.SoleymaniandS.D. Morgera, \AnEÆcientNearestNeighbourSearchMethod", IEEE T
rans-actionson Communications, vol.COM-35,no.6,pp.677{679,June1987.
[15] C-M.Huang,Q.Bi,G.S.Stiles,andR.W.Harris,\FastFullSearchEquivalentEncodingAlgorithms
forImageCompressionUsingVectorQuantization", IEEETransactions on Image Processing,vol.1,