Rochester Institute of Technology
RIT Scholar Works
Theses
Thesis/Dissertation Collections
6-1-1996
A Study of trellis coded quantization for image
compression
Kannan Panchapakesan
Follow this and additional works at:
http://scholarworks.rit.edu/theses
This Thesis is brought to you for free and open access by the Thesis/Dissertation Collections at RIT Scholar Works. It has been accepted for inclusion
in Theses by an authorized administrator of RIT Scholar Works. For more information, please contact
Recommended Citation
A
Study
of
Trellis Coded
Quantization
for
Image Compression
by
Kannan
Panchapakesan
A Thesis
submitted
in
partial
fulfillment
of
the
requirementsfor
the
degree
of
Master
of
Science
in
Electrical
Engineering
Department
of
Electrical
Engineering
College
of
Engineering
Rochester Institute
of
Technology
Rochester,
New York
A Study of Trellis Coded Quantization
for Image Compression
by
Kannan Panchapakesan
A Thesis submitted in partial fulfillment
of the requirements for the degree of
Master of Science
.
In
Electrical Engineering
Approved by:
Professor S.W. McLaughlin, Thesis Advisor
Professor S.A. Dianat
Professor rvLR. Raghuveer
Professor R. Unnikrishnan, Department Chair
Department of Electrical Engineering
College of Engineering
Rochester Institute of Technology
Rochester, New York
Thesis Release Permission
A Study of Trellis Coded Quantization
for Image Compression
I,
Kannan Panchapakesan, hereby grant permission to the 'Wallace Memorial
Library of Rochester Institute of Technology to reproduce my thesis in whole or in
part. Any reproduction will not be for commercial use or profit.
Signature:
_
/
Dedicated
to the
loving
memory
of
my
uncle
Acknowledgments
I
wouldlike
to thank my
advisorDr.
S.W.
McLaughlin,
without whosehelp
and guidance
this thesis
would nothave been
possible.His
constant support and encouragementreally
helped
me alot
and mademy
life
alot
easier.He's
simply
awonderful person and
I
enjoyedworking
withhim.
I
wouldlike
thank
Dr.
M.R. Raghuveer
for
his
useful suggestions as a memberof
my thesis
committee andfor
allhis
help
in
the
implementation
ofthe
wavelettransform.
I
would alsolike
to
place on recordmy
gratitudeto
Dr.
S.A. Dianat for
being
onmy
thesis
committee andproviding
usefulsuggestions.My
thanks
aredue
to my
friends
atthe
Center for
Imaging Science,
Ajit
andWei,
who
helped
me alot in
the
preparation ofthis
thesis
andfor
allthose
fun
times
wehad
together.
I
wouldlike
to thank my absolutely terrific
brother,
Ramani,
without whosehelp
and
encouragement,
I
would remained a computer-illiterate.I
wouldlike
to
take
this opportunity to thank my
wonderfulroommates,
Deepthi
and
Vinay. for
making my stay
here
atRochester
avery
pleasant experience andfor
simply putting up
with me.My
specialthanks
goto
Dr.
S.
Madhu
andhis
family
for
alltheir
help
andthe
affection
they
have
showered on me.Finally.
I
oweeverything
to my
wonderfulparents,
whohave
alwaysbeen
there
Abstract
Trellis
coded quantizationhas
recently
evolved as a powerful quantizationtech
nique
in
the
world oflossy
image
compression.The
aim ofthis
thesis
is
to investigate
the
potential oftrellis
coded quantizationin
conjunction withtwo
ofthe
most popularimage
transforms
today;
the
discrete
cosinetransform
andthe
discrete
wavelettrans
form.
Trellis
coded quantizationis
compared withtraditional
scalar quantization.The 4-state
andthe
8-state
trellis
coded quantizers are comparedin
an attemptto
come
up
with a quantifiabledifference in
their
performances.The
useof pdf-optimizedquantizers
for
trellis
coded quantizationis
also studied.Results
for
the
simulations performed ontwo
gray-scaleimages
at anuncodedbit
rate of
0.48
bits/pixel
arepresentedby
way
of reconstructedimages
andthe
respectivepeak signal-to-noise ratios.
It is
evidentfrom
the
results obtainedthat trellis
codedquantization outperforms scalar quantization
in
both
the
discrete
cosinetransform
and
the
discrete
wavelettransform
domains. The
reconstructedimages
suggestthat
there
does
not seemto
be
any
considerablegainin going from
a4-state to
a8-state
trellis
coded quantizer.Results
also suggestthat
considerable gain canbe
had
by
employing
pdf-optimized quantizersfor
trellis
coded quantizationinstead
of uniformContents
1
Introduction
1
1.1
Background
andMotivation
3
1.2
Overview
andContributions
ofthe thesis
4
2
Review
ofTransform
Theory
6
2.1
The Optimal Transform
7
2.2
The Discrete
Cosine Transform
8
2.3
The Wavelet Transform
10
2.3.1
The
Continuous
Wavelet Transform
10
2.3.2
The Discrete Wavelet Transform
13
2.4
Summary
18
3
Review
ofQuantization
Theory
19
3.1
Scalar
Quantization
19
3.2
The Uniform
Quantizer
21
3.2.1
Average
Distortion
22
3.2.2
Signal
to
Noise Ratio
26
3.3
The
Optimal
Quantizer
28
3.3.1
The
Optimal
Encoder
for
aGiven Decoder
28
3.3.3
Algorithm for design
of alocally
optimal quantizer31
3.4
Bit
Allocation
32
3.4.1
Optimal bit
allocationfor
the
Karhunen-Loeve Transform
. .33
3.4.2
Optimal bit
allocationfor
aDiscrete Cosine Transform based
system
34
3.4.3
Optimal bit
allocationfor
aDiscrete Wavelet Transform based
system
37
3.5
Vector
Quantization
39
3.6
Trellis Coded
Quantization
42
3.6.1
A
numericalexample of a4-state
TCQ
45
4
Simulation Results
48
4.1
The Discrete Cosine Transform
based
system49
4.1.1
Results
for
the
LENA
image
52
4.1.2
Results
for
the
AIRPLANE image
54
4.2
The Discrete Wavelet Transform based
system55
4.2.1
Results for
the
LENA image
57
4.2.2
Results
for
the
AIRPLANE image
58
4.3
Summary
ofResults
58
4.4
Conclusions
59
List
of
Tables
3.1
The Lloyd II
algorithm32
3.2
The LBG
algorithm41
4.1
Results
for
the
LENA image
at an uncodedbit
rate of0.48 bits/pixel
for
the
various quantization schemes(sq
= scalarquantization;
tcq-4st
=4-state
TCQ
; tcq-8st
=8-state
TCQ
;
opt = pdf-optimizedquantization) in
the
DCT-based
system52
4.2
Results
for
the
AIRPLANE image
at an uncodedbit
rate of0.48
bits/pixel for
the
various quantization schemes(sq
= scalar quantization; tcq-4st
=4-state
TCQ;
tcq-8st
=8-state
TCQ
;
opt =pdf-optimized
quantization) in
the
DCT-based
system54
4.3
Results for
the
LENA image
at an uncodedbit
rate of0.48
bits/pixel
for
the
various quantization schemes(sq
= scalarquantization; tcq-4st
=
4-state
TCQ
; tcq-8st
=8-state
TCQ
)
in
the
DWT-based
system.57
4.4
Results
for
the
AIRPLANE image
at an uncodedbit
rate of0.48
bits/pixel for
the
various quantization schemes(sq
= scalar quantization;
tcq-4st
=4-state
TCQ;
tcq-8st
=8-state
TCQ)
in
the
DWT-based
List
of
Figures
1.1
Block
diagram
of atypical
transform-based image
compression system.2
2.1
(a)
An
Ni
xAr2
image is
divided
into
Nbi
xNb2
blocks,
each of sizeM
x
M.
(b)
A
DCT
transformed
block is
shown,
indicating
the
positionof
the
DC
andthe
highest
frequency
AC
coefficients10
2.2
(a)
The
mother wavelet(Cubic
B
Spline),
(b)
The
mother waveletcontracted
by
afactor
of2.
(c)
The
mother waveletdilated
by
afactor
of
2
12
2.3
Time-Frequency tiling
affordedby
the
wavelettransform
14
2.4
Basic
filter bank
implementation
ofthe
wavelettransform
15
2.5
The
process of waveletdecomposition
of animage implemented
using
digital
filters
16
2.6
A
3-level
octave-band or waveletdecomposition
of animage
into
its
constituent
frequency
subbands.The
spectraldecomposition
and ordering
ofthe
channelsis
shown17
3.1
A
one-dimensional quantizer withthe
end points and output pointsmarked on
it
20
3.2
The
Staircase
type
input-output
characteristic of a quantizer22
3.4
(a)
A
4-state
trellis
with subsetlabeling,
(b)
Codebook
partitioning
for
2
bits/sample
TCQ
43
3.5
(a)
Symbolic
representation of aTCQ,
whereX is
the
input
vectorand
Q(X),
the
quantized vector,(b)
The
format
ofthe
output ofTCQ.
45
3.6
(a)
Partitioning
ofthe
scalar quantizerinto
four
sub-quantizers,
for
trellis
coding
at a rate of2
bits/sample,
(b)
An
example of a4-state
trellis
indicating
the
distortions
at each node andthe
best
paththrough
the trellis
47
4.1
Block
diagram
ofthe
scalar quantization-DCTbased
system50
4.2
Block diagram
ofthe trellis
coded quantization-DCTbased
system. .51
4.3
Block
diagram
ofthe
Wavelet based
system56
4.4
Monochrome LENA image
(512
x512)
61
4.5
lena-sq
(DCT).
Bit
rate(uncoded)
=0.48 bits/pixel.
PSNR
=25.298012
dB
62
4.6
lena-sq-opt (DCT).
Bit
rate(uncoded)
=0.48
bits/pixel.
PSNR
=25.644171
dB
63
4.7
lena-tcq-4st
(DCT).
Bit
rate(uncoded)
=0.48 bits/pixel.
PSNR
=28.116083
dB
64
4.8
lena-tcq-4st-opt
(DCT).
Bit
rate(uncoded)
=0.48
bits/pixel. PSNR
=
27.265438
dB
65
4.9
lena-tcq-8st
(DCT).
Bit
rate(uncoded)
=0.48
bits/pixel.
PSNR
=28.126045
dB
66
4.10 lena-tcq-8st-opt
(DCT).
Bit
rate(uncoded)
=0.48
bits/pixel. PSNR
=
27.495291
dB
67
4.12 airplane-sq
(DCT).
Bit
rate(uncoded)
=0.48 bits/pixel.
PSNR
=22.917770 dB
69
4.13
airplane-sq-opt(DCT).
Bit
rate(uncoded)
=0.48 bits/pixel.
PSNR
=24.069099 dB
70
4.14
airplane-tcq-4st(DCT).
Bit
rate(uncoded)
=0.48 bits/pixel.
PSNR
=
24.415953 dB
71
4.15
airplane-tcq-4st-opt(DCT).
Bit
rate(uncoded)
=0.48 bits/pixel. PSNR
=
23.161022
dB
72
4.16
airplane-tcq-8st(DCT).
Bit
rate(uncoded)
=0.48 bits/pixel.
PSNR
=
24.463374
dB
73
4.17
airplane-tcq-8st-opt(DCT).
Bit
rate(uncoded)
=0.48 bits/pixel.
PSNR
=
23.250565 dB
74
4.18
lena-sq
(DWT). Bit
rate(uncoded)
=0.48
bits/pixel.
PSNR
=27.762236
dB
75
4.19 lena-tcq-4st
(DWT).
Bit
rate(uncoded)
=0.48
bits/pixel.
PSNR
=28.685629
dB
76
4.20 lena-tcq-8st
(DWT).
Bit
rate(uncoded)
=0.48
bits/pixel.
PSNR
=28.710915
dB
77
4.21
airplane-sq
(DWT).
Bit
rate(uncoded)
=0.48
bits/pixel.
PSNR
=23.998899
dB
78
4.22
airplane-tcq-4st(DWT).
Bit
rate(uncoded)
=0.48
bits/pixel. PSNR
=
27.339706 dB
79
4.23
airplane-tcq-8st(DWT). Bit
rate(uncoded)
=0.48
bits/pixel. PSNR
Chapter
1
Introduction
In
today's
world of global multimediainter-networking,
there
is
anever-growing
needto
transmit
and storedigital
images
effectively,
which necessitatesthe
useof compression.
This
for
the
most part explainsthe
pace at whichthe
field
ofimage
compressionis expanding
andthe
fact
that
researchers areconstantly
trying
to
comeup
withbet
ter
schemesfor
compressing images.
Some
ofthese
schemes are customtailored to
a particular application while others are
broad based
and usedin
avariety
of applications.
A
good example ofthe
former
wouldbe
the
FBI
waveletfingerprint
image
compression standard which
has been
designed
exclusively
for fingerprint
images
[8].
An instance
ofthe
latter is
JPEG,
the
ISO-CCITT
stillimage
compression standardwhich
is
capable ofcompressing
a widevariety
ofimages
efficiently
[21].
In
this
thesis,
atransform-based
approachto
image
compressionis
used.There
fore, it
wouldbe
instructive
to
gain a goodunderstanding
ofit.
Any
transform-based
image
compression system canbe
representedas shownin
Figure
1.1.
The
top
three
blocks
together
form
the
process called encoding,whereby the
input image is
converted
into
coded(compressed)
image data.
The
encodeddata
is
then
stored ortransmitted.
For
the
purpose ofthis
thesis,
it is
assumedthat the
transmitted
or(an
exact reverse ofthe
process ofencoding),
whichis
constitutedby
the three
blocks
below.
The
transform
processesthe
image
data
in
such away
asto
makeit
moreamenable
to
compression(for
e.g.,
it
decorrelates
the
image data). Quantization
mapsthe
output ofthe transform
(which
cantake
on aninfinite
number ofvalues)
to
afinite
set ofpre-determinedvalues sothat
they
canbe
representedby
afinite
numberof
bits.
Coding
is
alossless
procedure which produces codewordsin
accordance withthe probability
distribution
ofthe
quantizer output.Coding
attemptsto
achievethe
entropy
rate ofthe
quantized outputby
removing
further
redundancy
in
it.
Image ->
Forward
Transform
Quantization
->Encoding
Coded
Image dataV
Reconstructed
ImageInverse
Transform
<-Inverse
Quantization
<-Decoding
Coded Image
j
-data
Figure 1.1: Block diagram
of atypical transform-based
image
compression system.This
compressionis
saidto
be
'"lossy"with all of
the
loss occurring in
the
quantizer
(ignoring
the
almost negligible precisionloss due
to the
transform
itself,
in
any
practical system).
One
ofthe
mostwidely
usedtheoretical
benchmarks
to
gaugethe
performanceof
any
compression systemis
the
rate-distortionfunction.
Rate-distortion1theory
is
the
information
theoretic
approachto the study
of optimal sourcecoding
systems.The
approach wasfirst
suggestedin Shannon's
originaldevelopment
ofinformation
theory.
Shannon's
sourcecoding
theorem
combines an ergodictheorem
withthe
idea
of randomcoding
in
orderto
characterizethe
optimal achievablebehavior
of1It
is
also sometimes referredto
asDistortion-rate
theory
ortheory
of
sourcecoding
with afidelity
a particular mathematical model of
any
system.This
behavior
is
characterizedby
the
rate-distortionfunction
ofthe
source anddistortion
measure.The
rate-distortionfunction
provides us with an absolutelower
bound
onthe
achievabledistortion
at agiven rate or
vice-versa
for
any
given system.For
a morein-depth
and mathematicaltreatment
ofthe
rate-distortiontheory,
the
readermay
referto
[1]
[6].
1.1
Background
and
Motivation
In
this
thesis,
arelatively
new quantizationtechnique
known
astrellis
coded quantization
(TCQ)
is
studied.A
very
brief description
ofit
is
providedhere
but
a moredetailed description
canbe found in
Section
3.6.
Trellis
coded quantization owesits
existence
to
its
vastly
populardual,
trellis
coded modulation(TCM),
developed in
the
late 1980s
by
Ungerboeck
[18]
[19].
TCM is
one ofthe
most sophisticatedmodulation
schemes availabletoday.
It is
being
appliedto
a vast spectrum of applicationsincluding
all phoneline
modemstransmitting
above4800
bits/sec
(for
e.g., the
v.34
modem)
andin
satellite communications.Though
the
TCQ
schemehas
not made such adent in
commercial products sofar,
that
day
is
certainly
notfar
off,
because
ofits
power and versatility.The
beauty
ofthe
TCQ
lies in
the
fact
that
it is
ableto
deliver
some ofthe
punchof a vector quantizer(VQ)
whileretaining
only
afraction
of a
VQ's
complexity.This
vast, potential ofthe
TCQ
was a majormotivating
fac
tor
behind
this
thesis.
The
aim ofthis
work wasto
explorethe
realm ofTCQ
andbear
witnessto
its
potentialin
two
very
different
domains,
namely
the
discrete
cosinetransform
(DCT)
andthe
discrete
wavelettransform
(DWT).
andto
compareit
to
the
moretraditional
scalar quantization.This
workis
by
no meansthe
first
attemptto
explorethe
potential ofTCQ. This
work
in fact
owes a greatdeal
to
previous workby
Fischer
andWang
[3],
Marcellin
about
this
workis
that the
issues
of quantization andcoding
areisolated
to
focus
exclusively
onthe
former.
1.2
Overview
and
Contributions
ofthe
thesis
Chapter
2
develops
the
background
materialin
transform
theory
necessary
for
the
restof
the thesis.
It
briefly
goes aboutdescribing
the
Discrete Cosine Transform
(DCT),
its
salient properties andits
applicationsto
image
compression.The
remainderofthe
chapter
is
devoted
to
discussing
the
Wavelet Transform
(continuous
anddiscrete)
andthe
realizationofthe
Discrete Wavelet Transform
(DWT)
using
filter banks
necessary
for its
applicationto
image
compression.Chapter
3
deals
withthe
fundamentals
of quantizationtheory.
The
readeris first
introduced
to
the
concept of scalar quantization.Uniform
quantizationis
then
dealt
with
in detail. The
chapterthen
goes onto
explainthe
theory
behind
pdf-optimizedquantization and
the
algorithm usedin
this thesis to
build
alocally
optimal quantizer.The
allimportant idea
ofbit-allocation
is
brought
out andthen the
allocationformulas
for
two
different
systems,
i.e.,
the
DCT
andthe
DWT,
arederived.
The
chapterconcludes
by
taking
alook
at vector quantization andtrellis
codedquantization, the
crux of
this
work.In
Chapter
4,
the two systems,
i.e.,
DCT-based
andthe
DWT-based,
that
weresimulated
in
this
work aredescribed
in detail.
Then
the
results ofthe
simulationsare presented
by
way
ofthe
reconstructedimages
andtheir
peak signal-to-noise ratio(PSNR)
valuesfor
the
scalar andtrellis
coded quantization schemes.The
contributions ofthis thesis
arethe
following
:The
issues
of quantization andcoding
areisolated in
orderto
concentratesolely
quan-tization
are comparedin
orderto
comeup
with a quantifiabledifference
in
performance.
The
working
ofTCQ
is
studiedin
two very
different
and populardomains,
namely the
DCT
andthe
DWT.
Investigations
are carried outto
see whetherTCQ
improves
features
and reducesbackground
noise andartifactsin images
abovethat
predictedby
the
maximum1.53
dB
cell shape gain.It is
shownhow
TCQ
offersbetter
performancein
conjunction with optimalquantization
in
the
case of aDCT-based
system.A
comparison ofthe
performance of a4-state
and a8-state
TCQ
is
also preChapter
2
Review
of
Transform
Theory
A
transform
is
amapping from
afunction
of onedomain
to
afunction
ofanother,
subject
to
a set of predefined rules.In
the
area ofimage
compression, the
major goals of atransform
areto
decor-relate
the
original signal(image)
andto
redistributethe energy
among
only
a smallnumber of
transform
coefficients.This
leads
to
a situation wheremany
insignificant
coefficients can
be
thrown away
after appropriate quantization.A
transform
is
saidto
be
one-dimensional if it is
performed over a single row
or column of pixels.
It is
saidto
be
two-dimensional
(2-D),
if it is
performed overa
2-D
array
of pixels.All
transforms
appliedto
images
areessentially
2-D,
though
sometimes
it is
more practicalto
realize a2-D
transform
by
means oftwo
successive1-D
transforms.
There
aremany
characteristicsthat
aredesirable in
atransform,
whenit is
usedfor
the
purpose ofimage
compression :Image decorrelation
:There is
ahigh
degree
of correlationbetween
pixelsin
any
naturalimage.
Therefore,
one ofthe
major goals ofany transform is
to
decorrelate
the
sourceimage
as much as possible.vari-ations
among
images,
the
optimumtransform usually
depends
onthe
givenimage. This
makesfinding
the
basis
functions
atedious
andcostly task.
This
also means
that these
optimumbasis functions
mustbe
sentto the
decoder
which results
in
a reduction of compression efficiency.Therefore,
it is
generally necessary to trade
off optimum performancefor
atransform
whosebasis
functions
areimage-independent.
Fast
implementation
:There is
alwaysthe
needfor
atransform to
be
amenable
to
fast implementation in
orderto
save on cost andtime.
2.1
The
Optimal Transform
An
optimaltransform
is
onethat
is
capable ofextracting
allthe
correlation presentin
the
input
data,
i.e.,
completely
decorrelating
the
input data. The
Karhunen-Loeve1transform
(KLT),
which wasfirst introduced
by
Karhunen
andLoeve,
is
an exampleof such a
transform.
In
orderto
understandthe
working
ofthe
KLT,
considerthe
following.
Let
X
be
a real-valued vector whose components are correlated with one another.It is
possible
to
select an orthogonal matrixT,
for
agiven pdfdescribing
X,
that
will makeY
=TX
have
pairwise uncorrelated components.Let
Rx
E[XXT],
denote
the
autocorrelation matrix of
the
input
columnvectorX. Let
u,-denote
the
eigenvectorsof
Rx
(normalized
to
unitnorm)
andA;
the
corresponding
eigenvalues.Since
any
autocorrelation matrix
is
symmetric and nonnegativedefinite,
there
arek
orthogonaleigenvectors and
the
corresponding
eigenvalues are real and nonnegative.Let
usassume,
withoutany
loss
ofgenerality that
Ai
>
A2
>
Xk
>
0.
^^This
transformis
alsocommonlyreferredtoasthehotelling,
principalcomponent,
oreigenvectorThe KLT
matrix canthen
be
constructed asT
= VT. whereV
=[uiu2---ufc],
that
is,
the
columns ofV
arethe
eigenvectors ofRx-Then
the
autocorrelation matrix ofthe
outputY is
givenby
Ry
=E[YYT]
=E[VTXXTV]
Ai
0
0
0
A2
0
=
VTRXV
0
0
A*
Thus
it
canbe
seenthat the
KLT
does
indeed decorrelate
the
input
vector.It
also
follows
that the
variances ofthe transform
coefficients arethe
eigenvalues ofthe
autocorrelation matrix
Rx. For
proof ofoptimality
ofthe
KLT,
the
readeris
referredto
[5].
As
canbe
inferred
from
the
abovediscussion,
Rx,
the
autocorrelation matrix ofthe
input
vectorA
.is
requiredto
constructthe
KLT
matrix.In
otherwords, the
KLT is input
dependent.
In
the
context oftransformation
ofimages,
althoughthe
KLT
wouldbe
optimumfor
a givenimage,
it
would alsobe image
dependent.
The
KLT is
therefore
not a good candidatefor
animage
transform.
2.2
The
Discrete Cosine
Transform
The 2-D NxN Discrete
Cosine
Transform
orDCT.
usedin image processing
applications
has
forward
andinverse
relations givenby
equations(2.1)
and(2.2)
respectively..v-iAr-1
F(u,v)
=-C(u)C(v)
}_^
2^
f{x-y)cos
cos-4
x=0 i/=0
2X
2X
f(xiy)
=^LL
C(u)C{v)F(u,v)cos
cos(2.2)
where
C(u),C(v)=l1Jy/*
io[u^
=^
v h y '
1
otherwise.In
this
thesis,
the
sourceimage is
first blocked
into 8x8
blocks
andthen the
2-D
DCT
appliedfor
each one ofthese
blocks.
Each
one ofthese
8x8 image
blocks
canbe
viewed as a64-point
discrete
signal,
whichis
afunction
ofthe two
spatialdimen
sions x and y.
The DCT
takes
such a signal anddecomposes
it into 64
orthogonalbasis
signals.Each
contains one ofthe
64
unique2-D
"spatial
frequencies"
which
comprise
the
input
signal's"spectrum".
The
output ofthe
DCT is
the
set of64
basis-signal
amplitudes or"DCT
coefficients"
whose values are
uniquely
determined
by
the
particular64-point
input
signal.The DCT
coefficients canthus
be
regarded asthe
relative amount ofthe
2-D
spatial
frequencies
containedin
the
64-point
input
signal.The
coefficient with zerofrequency
in
both
x andy
directions
is
referredto
asthe
"DC
coefficient"and
the
remaining 63
coefficients are referredto
asthe
"AC
coefficients"as
in
Figure
2.1.
Because
pixelvaluesvary slowly
from
pointto
point across atypical
image,
the
DCT
lays
the
foundation
for
achieving
data
compressionby
concentrating
most ofthe
signalenergy
in
the
lower
spatialfrequencies.
For
an8x8
pixelblock from
atypical
image,
most ofthe
spatialfrequencies have
zero or near-zero amplitude andhence
need notbe
encoded.At
the
decoder,
the
IDCT
reversesthe
effect ofthe
DCT. It takes
the
64
DCT
Nbi-1
v.
0
1
2
(a)
DC
Coefficient
Highest
frequency
AC Coefficient
NB2"1
M-l
V
0
1
2
M-l
(b)
Figure 2.1:
(a)
An
Ni
xAT2
image is
divided into
Nbi
xNb2
blocks,
each of sizeM
x
M.
(b)
A
DCT
transformed
block is
shown,
indicating
the
position ofthe
DC
andthe
highest
frequency
AC
coefficients.If
the
DCT
andthe
IDCT
canbe
computed witharbitrary
precision andif
there
was no quantization
involved,
the
64-point
input
signal canbe
exactly
recovered.In
principle,
the
DCT introduces
noloss
to the
sourceimage
pixels.It
merely transforms
them
to
adomain
wherethey
canbe
moreefficiently
encoded.2.3
The Wavelet Transform
2.3.1
The
Continuous
Wavelet Transform
The
continuous wavelettransform,
in
general,
canbe
expressed asthe
time-integrated
product of asignal
s(t)
with a set ofanalyzing
functions,
which aresimply the
dilated
and shifted versions of a given
"mother
wavelet".Since
the
mother wavelet canhave
many
different
forms,
subjectto
certain mathematicalconstraints,
the
analyzing
functions
cantake
animpressive variety
offorms,
which packsthe
wavelettransform
The
continuous wavelettransform
is
givenby
/CO
W(x,
y)
=(s(t)
h,,v(t))
=/
8(t)hlv(t)dt(2.3)
Jt= OO
The analyzing
functions
hXty{i)
for
the
wavelettransform
arehxM
=^=g[(t-y)/x]
(2.4)
y/X
where
g{t)
is
afunction
such asthe
one shownin Figure
2.2,
known
asthe "mother
wavelet"
.
The
nicething
about such a mother waveletis
that
it is
offinite duration
and
therefore
its
energy
is
concentratedin
a small region.The
inverse
wavelettransform
is
givenby
.w=.-
r
r
w{x-ylh-'{t)^y
(2.5)
Jx=oo Jy=oo
Xwhere x
>
0
and zis
a normalizationconstantdetermined
by
g(t).The
mathematical constraintsimposed
onthe
mother waveletin
orderto
insure
perfect reconstruction of
the
signal afterthe
inverse
transform
are asfollows.
g(t) has
finite
energy,
i.e.,
I"
\9(t)\2dt
(2.6)
Jt= CO
is finite.
The Fourier
transform
ofg(t),
G(w),
is
suchthat
r
\G(w)\2
Jw=oo
\w\
is finite. It
canbe
shownthat
these
conditionslead
to the
requirementthat the
time
average ofthe
waveletfunction be
zero,
i.e.,
/OO
L
=0
12 14
Figure 2.2:
(a)
The
mother wavelet(Cubic B Spline),
(b)
The
mother wavelet conAs
a result ofthis
constraintthe
wavelettransform
removesthe
DC
componentof
the signal; this
DC
component canbe
addedback
to the
signal afterperforming
the
inverse
wavelettransform.
Any
signals(t)
canbe
expandedin
terms
ofthe
analyzing functions
hXiV(t)
asshown
in
equation(2.5).
The analyzing
functions for
the
wavelettransform
consist oftranslations
anddilations
ofthe
mother wavelet.Increasing
the
parameter xin
equation
(2.4)
stretches g(ty) in
time
andhence
the term "dilation".
Decreasing
x would result
in
"contraction". The
parametery
causesthe
function
to
slidealong
the
time
axis andhence
the term
"translation".
To
understandhow
the
wavelettransform works,
it is
usefulto study the
effectsof parameter x on
the
analyzing functions
themselves.
For
large
values ofx, the
analyzing
functions
are wider andhence
the energy
in
their
spectrais
concentratedin
the
low frequencies.
As
aresult, the
low
frequency
information in
the
signalis
captured
by
the
analyzing functions
withlarge
values of x.Applying
similarlogic,
for
small values ofx, the
analyzing
functions
are narrower andhence
their energy
concentrated
in
the
high
frequency
end ofthe
spectrum.Therefore analyzing
functions
with small values of x capture
the
high
frequency
information in
the
signal.Thus
for
various values ofre, the
analyzing
functions
are capable ofcapturing information
at various scales of
frequency.
Hence,
the
form
oftime-frequency
representation affordedby
the
wavelettransform
is
referredto
asthe
"Time-Scale" representationas
illustrated
in Figure 2.3.
2.3.2
The
Discrete Wavelet
Transform
As
seenin
equation(2.4),
the
continuous wavelettransform
has
a continuous rangeof values
for
x andy
andhas
a certain amount ofredundancy
associated withit.
(freq.)
w0
LOqI'j
wq/4
H 1 h H 1 h
4r
-i i h
8T
i2r
(time)
Figure 2.3:
Time-Frequency
set ofvalues such
that
x = 2kand
y
=2kl
wherek
and/
areintegers. The
resulting
transform,
known
asthe
discrete
wavelettransform
(DWT)
is
then
givenby
/oo
W(k,l)
=/
s(t)h*t2kl(t)dtJt=oo
The corresponding Inverse
DWT
is
givenby
(2.9)
OO OO
*(*)
= *E
E
w(k,i)h2kakl(t)
k=oo I00
(2.10)
where z
is
a normalization constant.An
orthonormalbasis is
obtained when<
/&2*,2*/(^) ^v
vi'tt) >=djkdU'.
Meth
ods
to
construct such orthonormalbases
arebased
onthe
theory
of multiresolutionanalysis
[7].
Using
this
theory,
it
canbe
shownthat the
discrete
wavelettransform
of a signal
is
equivalentto
splitting it into its
constituentfrequency
subbands,
i.e,
the
signalis
subjectedto
a system ofappropriately
designed filter banks. For
a moredetailed discussion
onthe relationship
between
wavelets andfilter
banks,
the
readeris
referredto
[2] [7]
[20].
x(n)-h(n)
g(n)
*2^KP
l2^(t2
h'(n)
A
y(n)g'(n)
Figure 2.4: Basic
filter
bank implementation
ofthe
wavelettransform.
One-dimensional
DWT
(the
separable2-D
caseis
astraightforwardextension)
canbe
described
in
terms
of afilter bank
as shownin Figure
2.4.
An input
signalx(n)
is
alternate samples of
the
outputs ofthese
filters
arethen
discarded,
corresponding
to
decimation
by
afactor
oftwo.
The
decimated
outputs ofthese
filters
constitutethe
reference signalr(n)
anddetail
signald(n)
for
a onelevel decomposition.
For
reconstruction,
interpolation
by
afactor
oftwo
is
performed,
followed
by filtering
using the
low-pass
andhigh-pass
synthesisfilters
h'(n)
andg
(n)
as shown.Provided
the
system satisfiesthe
perfect reconstructionproperty, the
sumofthe
outputs ofthe
synthesis
filters
will givey(n)
=Ax(n
rid),
whereA is
the
gain andrid
the
delay
ofthe
system. Columns Columnsh(n)
LPF
-*<Jj)-*h(n)
LPF
j 3 Rowsg(n)
HPF
-UY-Image
h(n)
LPF
-KD >g(n)
HPF
Rowsg(n)
HPF
-*Ujh-LL
LH
HL
HH
Figure 2.5:
The
process of waveletdecomposition
of animage implemented
using
digital filters.
The 2-D discrete
wavelettransform
for images is
implemented
using
digital
filters
as shown
in Figure 2.5. The image is
first
row-filteredthrough
alow-pass
(LP)
and ahigh-pass
(HP)
filter
andthen
columndownsampled
by
afactor
oftwo.
This
resultsin
aLP
and aHP image
which arehalf
the
size ofthe
originalimage.
Now
eachdownsampled
by
afactor
oftwo,
resulting
in four
equal sized subbands.Each
ofthese
bands is
one quarterthe
size ofthe
originalimage. The low-pass band
can nowbe
subjected
to the
sameprocess,
as above andtherefore
canbe
splitinto four
morebands
and so on.LLL,
LLL
LLH,
LLL
LH,LL
H,L
LLL,
LLH
LLH,
LLH
LL,LH
LH,LH
L,H
H,H
Figure 2.6:
A
3-level
octave-band or waveletdecomposition
of animage into its
constituent
frequency
subbands.The
spectraldecomposition
andordering
ofthe
channels
is
shown.This
type
ofdecomposition
of animage,
as shownin
Figure
2.6,
into its
constituentfrequency
subbandsis known
as octave-band or waveletdecomposition. The
labeling
of a subband
in
the
figure is indicative
ofthe type
offilters
the
originalimage
wassubjected
to
(L
being
Low-pass
andH
being
High-pass).
The
reconstructionofthe
image
from
these
subbandsis
done
by
exactly reversing
the
decomposition
process shownin Figure
2.5,
i.e.,
the
subbands are subjectedto
2.4
Summary
Transform
domain
approachto
compressionis
very
powerful andholds
alot
of potential,
which would explainits
immense
popularity.In
this chapter, three
important
transforms
which arebeing
widely
usedtoday
have
been
reviewed.KLT,
whichis
the
optimal
transform
for
a givenimage
suffersfrom
being
image dependent. The
popular DCT is
capable ofhigh
energy
compaction ofthe
image,
has fast implementation
techniques
but
lacks
good spatiallocalization
due its bases
(sines
andcosines)
being
of
infinite
duration.
The
DWT
is
well suitedto
images because
ofits
good spatial andfrequency
localization
properties anddue
to
low-pass
nature of most naturalimages
Chapter
3
Review
of
Quantization
Theory
Quantization
is
the
heart
ofany
lossy
compression system.In its
simplestform,
a quantizer observes a single number and selects
the
nearestapproximating
valuefrom
a predeterminedfinite
set of allowed numerical values.Generally,
the
input
to
a quantizer
is
capable oftaking
onany
valuefrom
a continuous range of possible amplitudes andthe output, any
one ofthe
N
predetermined values.In
all cases of practicalinterest,
N is
finite
sothat
afinite
number ofbinary
digits is
sufficientto
specify the
output value.3.1
Scalar
Quantization
A
quantizeris
saidto
be
scalarif it
operates on oneinput
at atime.
In
otherwords,
it is
a one-dimensional quantizer.Associated
withevery
N
point quantizeris
a partition ofthe
realline 71
into
Ar cellsTZi,
for i
1, 2,
.N.
The
z'th cellis
given
by
TZi
{x
e
71
:Q(x)
=*/,}
=Q~l(yi),
the
inverse
image
ofj/,-under
the
quantization operation
Q. A
cellthat
is
unboundedis
called an overload cell.Each
bounded
cellis
called a granular cell.Together
all ofthe
overload(granular)
cells arecalled
the
overload region(granular
region).1, 2,
,A^}
andthe
corresponding
partition cells{TZf,
i
=1,
2,
,A^}.
A
quantizeris
saidto
be
regularif
Each
cellTZi
is
aninterval
ofthe
form
(xi-i,Xi)
together
with one orboth
ofthe
end points.yi
G
(x,-_i,x,-).
The
valuesXi
are often calledboundary
points,
decision
points,
or endpoints andthe
valuesj/t-are referred
to
as outputlevels,
outputpoints,
or reproduction values as shownin Figure 3.1.
A
regular quantizerhas
the
characteristicproperty that
if
two
input
values, say
a andb
with a<
b,
are quantized withthe
same outputvalue, w,
any
otherinput
that
lies between
a andb
will alsobe
quantizedto the
same outputvalue w.
.V .v ,V .K .v
,1 ,2 ,3
N-l
N
AAA
]
A
'
A3
yyy
v
y
12
3
N-l
N
Figure 3.1:
A
one-dimensional
quantizerwith
the
end points and output points marked onit.
The
performanceof a quantizeris
ultimately
dependent
on allthe
partitionbound
ary
values and on all ofthe
output points as well as onthe
input
statistics.Another
parameter
that
has
animportant
influence
on quantizer performanceis
the
loading
factor,
7,
which measuresthe
size ofthe
highest decision
level,
:rjv-i,
relativeto the
rms
(root
meansquared)
value, a,
ofthe
input
signal.Specifically,
the
loading
factor
is
givenby
V
where
V
is
the
peak signal magnitudethat
canbe
quantized withoutincurring
anexcessiveoverload error.
Zero
mean signals are consideredhere
sincethe
maximumsignal magnitude
is
taken to
be half
the
range(effectively
assuming that
zero signalis in
the
middle ofthe
range)
and square root ofthe
varianceis
consideredinstead
ofthe
average power.The
loading
fraction
,
(3,
is
the
reciprocal ofthe
loading factor,
i.e.,
(3
=I/7.
A
reasonable choice ofloading
factor
wouldbe
in
the
range of2
4.
Since,
the
loading
factor
is
indicative
ofhow
wellthe
quantizeris
matchedto the
input,
the
SNR
performancedepends
critically
onit.
3.2
The
Uniform Quantizer
The
most common of all scalar quantizersis
the
uniformquantizer,
sometimesknown
as
the
"linear" quantizerbecause
its
staircaseinput-output
responselies
along
astraight
line
(with
unitslope)
as shownin Figure 3.2.
All
quantizers areinherently
non
linear
in
the
proper sense ofthe
term.
A
uniform quantizeris
a regular quantizerthat
has
the
following
characteristics :The
boundary
points areequally spaced,
i.e.,
withstep
sizeA,
X{
xt_i
=A
for i
=2,3,--,N.
The
output orreproductionlevels
for
the
granular cells arethe
midpoints ofthe
quantization
interval,
i.e.,
yi
=(x;_i
+
xi)/2for
i
=2, 3,
,Ar
1
Thus,
the
partition ofthe
granular region consists ofintervals
oflength
A
andfor
unbounded
inputs,
the
quantizerhas
overload cells(00,21]
and(xn-i,+oo)
withy1
=xi
A/2
andt/jy
=;v-i
+ A/2. Note
alsothat
y,- =Output
H
1
H
Input
Figure 3.2: The
Staircase
type
input-output
characteristic of a quantizer.3.2.1
Average Distortion
The
purpose of quantizationis
to
provide alimited
precisiondescription
of a previously
unknowninput
sample.It is
only
because
the
input is
notknown
in
advancethat
it is
necessary to
quantize.Thus
the
input
mustbe
modeled as a random variable with a certainpdf.
Consequently,
the
errorintroduced in
this
sample will alsobe
random.
To
conveniently
assessthe
performance of a particular quantizer,there
is
aneed
for
a numberwhichis
a measure ofthe
overallquality
degradation
ordistortion
incurred
by
quantizing the
input.
A
lot
ofdifferent
distortion
measures are available(for
e.g., the
absoluteerror,
the
Mahalanobis distortion measure)
andgenerally
the
use of a measureis
dictated
by
the
particular application.The
most common measure ofthe
distortion
is
the
squared error
defined between
any two
numbers asd(x,x)
=\x
When
appliedto quantizers,
xis
the
input
and x =Q(x),
the
quantized output.For
an overall measure of performance of a quantizer a worst-case value or a statistical
average of some suitable measure of
the
distortion
canbe
used.If
the
squared errordistortion
measureis
used, the
worst-case squared erroris
simply the
square ofthe
maximum error.
The
problem with such a measureis
that
it depends
only
onthe
support region of
the
quantizer.Therefore,
for
an unboundedinput
with afinite
quantizer
(with
afinite
supportregion), the
worst-case erroris infinite
andhence
is
not a useful measure ofdistortion.
The
statistical measure ofthe
distortion is
usually
a moreinformative
and meaningful performance measure.In
general,
it
canbe
written asD
=E[d(X,Q(X))}
/OO
d(x,Q(x))fx(x)dx
(3.2)
-OO
where
fx(x)
is
the
pdf ofX.
This
measure ofdistortion is
usually
referredto
asthe
averagedistortion
orthe
mean squared error(MSE).
For
a giveninput
randomvariable,
X,
andquantizer,
Q
={yi,
Ri\
i
=1,2,-,
N},
the
averagedistortion
canbe
expressed asD
=E[(X
-Q(X)f]
=/(*-
yiffx(x)dx. t=iJRi
For
the
case of a regular quantizerthe
averagedistortion
canbe
expressed asD
=Ef(i-
yi)2fx(x)dx.What
makesMSE
one ofthe
most popular measures ofdistortion
is
that
it is
very easy to
workwith,
i.e.,
computational simplicity.it
penalizeslarger deviations
of outputsfrom inputs
morethan
the
smallerones,
However,
adownside
to
using
the
MSE in image
compression systemsis
that
it
does
not relate wellto
the quality
ofthe
reconstructedimage.
Let
us now considerthe
case of a uniform quantizer wherethe
input is bounded
with values
lying
in
the
range(a, b)
with sizeB
=b
a.Then
a =Xo
andb
xn
and
the
rangeis
divided
into N
equal quantizationcells,
each ofsizeA
=B/N. The
resulting
quantizerhas N
outputlevels,
ortreads,
withstep
sizeA
corresponding to
a staircase
input-output
characteristic with eachstep
of equal size.The
risers alsohave
sizeA.
It is
notdifficult
to
showthat
regardless ofthe
shape ofthe
input
pdf,if
the
rangeof
the
input
has
sizeB
then the
maximum possible erroris B/2N
=A/2
andthe
uniform quantizer minimizes
the
maximum error.This
givesthe
uniform quantizer ausefulrobustness so
that
it
maintains agood performancefor
a widevariety
ofinput
signals.
When
the
input
pdfis
uniformly
distributed
overthe
granularregion,
asin
the
caseof a uniform
quantizer,
it is
readily
seenthat the
quantizationerror, q
=[X
Q(X)\
has
a uniformpdf overthe
region [A/2, A/2],
i.e,
/g(g)
=1/A,
whereA
=B/N is
the
cell width.The
averagedistortion,
D,
from
equation(3.2)
is
then
givenby
D
=E[d(X,Q(X))]
=
E[(X-Q(X)f]
=Etf]
/A/2 =The
averagedistortion
giventhe
cellis
thus
simply
the
variance of a random variablethat
is
uniformly
distributed
on aninterval
of widthA,
that
is,
A2/12.
Thus
the
mean of
the
quantization erroris
givenby
E(q)
=0.
The
uniformdistribution
assumptionfor
quantization noiseholds
exactly
in
this
case.It
can alsobe
shownthat
a common quantizer noiseapproximation,
i.e.,
the
signaland noise are
uncorrelated,
is invalid in
this
case sinceA2
EUX)
---and
hence
the
signal andthe
quantizer error are not uncorrelated.In
practice,
mostinput
signals ofinterest
are unbounded.In
this case, there
aretwo
overload cells(00,
Xi]
and(xyv-i,
+00)
andL
=N
2
granular cells.The
mostsignificant consequence of
allowing
unboundedinputs is
that the
maximum erroris
now
infinite.
However,
mostinputs
ofinterest have
rapidly
decreasing
tail
probabilities
for
their
pdf's
sothat
for
suitable quantizerdesigns
the
overload region wouldhave
avery
low
probability
ofcontaining
aninput
sample.Hence,
performance results
for
uniform quantization withbounded
inputs
areapproximately
valid whenthe
input
is
unbounded aslong
asthe
loading
fraction
is sufficiently
small.Nevertheless,
it is important
to
realizethat
in
general, the
averagedistortion,
D,
is
due
to
combinedeffects of granular and overload
distortions, i.e.,
D
=Dgran
+
Dol
(3.4)
where
DgTan
denotes
the
granulardistortion
andD0\,
the
overloaddistortion,
whichcan
be
shownto
be
/X\ /"
+00
(x-yi)2fx(x)dx+
(x-yN)2fx(x)dx.
(3.5)
-00 JXj^_1
3.2.2
Signal
to
Noise Ratio
In
mostcases,
a uniform quantizeris
designed
to
operate withhigh
resolutionsothat
N is
large,
the
probability
of overloadis
very small,
andthe step
sizeis
much smallerthan the
rms signallevel.
As
aresult,
some convenient approximations canbe
madeto
obtain general performance resultsthat
areindependent
ofthe
specificinput
pdf.The
signalto
noise ratio(SNR)
for
ahigh
resolution uniformquantizer withstep
size
A
canbe
calculated easily.Using
the
fact
that
A
=B/N
=2V/N
=2a^/N
,the
averagedistortion D in
equation(3.3)
canbe
rewritten asDgTan
=^272A-2
=^2722-2'
where
the
resolutionr =log2
N
andthe
subscript "gran"has
been
addedto
emphasizethat
the
overloaddistortion has been
neglected.Assuming
that the
signalhas
zeromean
(or,
equivalently, measuring
powerby
the
variance)
the
SNR
is
givenby
gAE
=101og10signalpWer =
101og10(3/?22-)
(3.6)
noise powerusing the
loading
fraction,
0
=I/7.
This
simplifiesto the
formula
SNR
=6.02r +
d
(3.7)
where
the
constantC\
= 101og103/32is
ordinarily
negativefor
reasonableloading
factors.
Note
that
the
SNR
increases
with0
untiloverloading
occurs.Although
these
results assumehigh
resolution,
it
turns
outthat
they
arefairly
accuratefor
even
moderately
low
resolutions.Equation
(3.7)
givesthe
wellknown
rule ofthe thumb:
the
SNR
increases
by
6
dB
for
each additionalbit
usedto
quantize aninput
sample.This
formula
suggeststhat the
performancewillcontinueto
improve
withoutbound
asthe
loading
fraction,
that
overload noise canbe
neglected.As
the
loading
factor
decreases
to
valuesbelow
2
or3,
the
contribution ofthe
overload noisebecome
significant.Therefore,
a morecareful
derivation
ofSNR
mustbegin
withthe
formula for
the
averagedistortion
asin
equation
(3.4).
Using
equation(3.5)
in
equation(3.4),
the
SNR
canbe
computedby
taking
into
accountboth
the
overload andthe
granulardistortions.
It is
ofparticularvalue
to
understandhow
the
SNR depends
onthe
loading
factor
whenoverload effectsare
included.
Figure
3.3
shows a plotbetween SNR
of a uniform quantizer andits
loading
fraction,
whichis
proportionalto the
input
signallevel
for
afixed
quantizer.For
afixed
rater, the
SNR
increases
steadily
with a slope ofunity
onthe
log-log
scaletill the
overload noise comesinto
play
andthen
it
decreases
fairly
rapidly.The
rateof
decrease due
to
overload noisedepends
onthe tail
characteristics ofthe
particularinput
pdf.SNR
in
dB
Loading
Fraction in dB
Figure 3.3:
SNR
vs.Loading
fraction(/3)
for
uniform quantizers.Note
that
for
each additionalbit
ofresolution,
the
SNR
curve risesvertically
by
6
power
levels
for
whichthe
SNR
remains closeto
its
peak value.In
otherwords, the
performance of a uniform quantizer
is
quite sensitiveto
the
input
signallevel. As
aresult, very
high
resolutions are neededto
obtainsatisfactory
performancefor
signalswhose power
levels
are notaccurately
known in
advance orvary
withtime.
3.3
The
Optimal Quantizer
The
principal goal of quantizerdesign
is
to
selectthe
reproductionlevels
andthe
partition regions or cells so as
to
providethe
minimum possible averagedistortion
for
afixed
number oflevels N
or, equivalently,
afixed
resolution r.In
general, this
problem
does
nothave
any explicit,
closed-form solution.Effective
algorithms arehowever
readily
available.More
explicitly,
for
the
average(mean
square)
distortion
measure, the
goal,
asusually stated,
is
to
find
the
output pointsy8-and partition cells
Ri
that
minimizeN
D
=I2
(x-yi?fx(x)dx
(3.8)
j=i
JR<
where
fx(x)
is
the
pdf ofthe
random variableX.
The
problem offinding
an optimal quantizeris
anintriguing
one.There
aretwo
critically
important
conditionsthat
arenecessary
for
optimality,
which are simpleto
derive
and understand.These
conditions aredealt
within
the
following
two
sections.3.3.1
The
Optimal Encoder
for
a
Given
Decoder
Consider
the task
offinding
the
optimal encoderfor
a givendecoder.
Equivalently,<