Rochester Institute of Technology
RIT Scholar Works
Theses
Thesis/Dissertation Collections
8-1-2002
Analysis and hardware implementation of color
map inversion algorithms
Michael Martin
Follow this and additional works at:
http://scholarworks.rit.edu/theses
This Thesis is brought to you for free and open access by the Thesis/Dissertation Collections at RIT Scholar Works. It has been accepted for inclusion in Theses by an authorized administrator of RIT Scholar Works. For more information, please [email protected].
Recommended Citation
Analysis and Hardware Implementation of
Color Map Inversion Algorithms
by
Michael W. Martin
A Thesis Submitted in Partial Fulfillment
of the Requirements for the Degree of
MASTER OF SCIENCE
Computer Engineering
Approved By:
Principal Advisor
Kenneth Hsu, Professor, Computer Engineering
Committee Member
-Soheil Dianat, Professor, Electrical Engineering
Committee Member
_
Athimoottil Mathew, Professor, Electrical Engineering
Department Head
Andreas Savakis, Associate Professor and Department Head, Computer Engineering
Department of Computer Engineering
College of Engineering
Rochester Institute of Technology
Rochester, New York
REPRODUCTION PERMISSION STATEMENT
PERMISSION DENIED
Analysis and Hardware Implementation of
Color Map Inversion Algorithms
I, Michael W. Martin, hereby deny permission to any individual or organization to
reproduce this thesis in whole or in part.
Michael W. Martin
Abstract
Thepurposeofthis thesis isto investigateseveral algorithmsthatare usedto computethe
inverse of a forward printer map. The forward printer map models the printer
by
mapping points in the printer's input color space to points in the printer's output color
space. The inverseofthis forwardmap isrequiredto convertinputcolor specificationsin
adevice-independentcolor spaceto a colorin the printer's device-dependentcolor space
before
being
presented to the print engine. The accuracy of the inverse printer mapdirectly
affects the accuracy ofthe reproduced colors.Therefore,
anymeasured changein the forward printer map requires re-computation ofthe inverse map if accurate and
consistent color reproduction is to be maintained. An efficient and accurate method of
computingtheinverse map couldbeusedinan automaticcolor correction system.
Three algorithms for computingtheinverse ofthe forwardprintermap arestudiedinthis
thesis project. These are the
Shepard's,
Moving
Matrix,
andIteratively
ClusteredInterpolation
(ICI)
algorithms. The algorithms are implemented in C and simulated inordertobenchmarktheirrelativeaccuracy, speed, and complexity. The simulations show
the ICI algorithmto be the fastest and most accurate at computing the inverse map, and
its complexity does not far exceed that ofthe other algorithms. The ICI algorithm was
implemented in VHDL and synthesized to a Synopsys generic
library
in order todetermine the approximate size and speed of an ASIC that could perform the inverse
computation. The finalimplementationresultedintwomodules: onethatimplementsthe
ICI algorithm, andonethat implementsthetrilinearinterpolation functionthat isused
by
synthesized trilinear interpolation module contained 190,357 cells. The
timing
ofthemodules resulted in a 40 nanosecond clock period, which corresponds to a maximum
operating
frequency
of25 MHz. These synthesized results show that this algorithm isTable
ofContents
1 INTRODUCTION
2 BACKGROUND INFORMATION
2.1 The XerographicPrinting Process 3
2.2 Color Spaces 4
2.2.1 RGBColor Space 4
2.2.2 CMY Color Space 4
2.2.3 LAB Color Space 6
2.3 Gamut Mapping 7
3 DEVICE MODELING 10
3.1 Forward Printer Map 11
3.2 Inverse Printer Map 12
4 THEORY 13
4.1 Algorithms 16
4.1.1 Shepard's Algorithm 17
4.1.2 Moving Matrix Algorithm 18
4.1.3 Iteratively Clustered Interpolation Algorithm 19
4.1.4 Trilinear Interpolation 23
5 SOFTWAREIMPLEMENTATION 25
5.1 Algorithm Metrics 25
5.1.1 DeterminingAlgorithm Complexity 25
5.1.2 DeterminingAlgorithmExecution Speed 26
5.1.3 DeterminingAlgorithm Accuracy 26
5.2 OptimalParameters 27
5.2.1 Shepard'sAlgorithm 28
5.2.2 Moving Matrix Algorithm 28
5.2.3 ICI Algorithm 28
5.3 Simulation Results 30
6 ICI HARDWAREIMPLEMENTATION 32
6.1 Design Methodology 32
6.1.2 Datastorage 33
6.1.2.1 Operation 34
6.1.2.2
Memory
Data Layout 356.1.3 Data Representation 36
6.2 Synthesis Methodology 37
6.3 Simulation&Testing 39
6.4 Hardware Modules 40
6.4.1 TrilinearInterpolation Module
(tritop.vhd)
406.4.1.1 Operation 42
6.4.1.2 Sub-blocks 43
6.4.1.3
Summary
ofResults 506.4.2 ICIModule
(icitop.vhd)
516.4.2.1 Operation 55
6.4.2.2 Input Description 55
6.4.2.3 Sub-blocks 56
6.4.3 SummaryofResults 63
6.5 SystemOverview 63
7 CONCLUDING REMARKS 65
7.1 Conclusions 65
7.2 Future Work 66
7.3 Acknowledgements 66
8 REFERENCES 68
9 APPENDICES 69
9.1 Software Simulation accuracy Results 69
9.1.1 Shepard'sInterpolation 70
9.1.2 Moving Matrix 71
9.1.3 ICI 71
9.2 VHDL Code 73
9.2.1 TRIJTOP.VHD 73
9.2.2 TRI_CTRL.VHD 77
9.2.3 TRICXBLOCK.VHD 84
9.2.4 TRI_TERM_BLOCK.VHD 85
9.2.5 TRI_ADDRJBLOCK.VHD 86
9.2.6 TRI_TB_TOP.VHD 87
9.2.7 ICITOP.VHD 89
9.2.8 ICI_CTRL.VHD 93
9.2.9 ICIJDP.VHD 103
9.2.10 DIS_DP.VHD 104
9.2.11 ADDR_CALC.VHD 105
List
ofFigures
Figure 1: The Xerographic
Printing
Process 3Figure 2: AxesoftheCIELAB color space 6
Figure 3: Gamut
Mapping
for Constant LightnessandHue 10Figure 4: Systemof P"1
andP 14
Figure 5: Sub-cubewithLattice Points Determined
During
Extraction 23Figure 6: ID LinearInterpolation 24
Figure 7:
Memory
Module Diagram 34Figure 8:
Top
Level DiagramofTrilinearInterpolationModule 40Figure 9: Submodule DiagramofTrilinearInterpolationModule 41
Figure 10:
Top
LevelDiagramofTrilinear Interpolation TRICTRL Block 44Figure 11:
Top
Level DiagramofTrilinear Interpolation TRIADDRBLOCK 45Figure 12: DatapathArchitectureofTRIADDRBLOCK 46
Figure 13:
Top
Level DiagramofTrilinear Interpolation TRICXBLOCK 47Figure 14: DatapathArchitectureofTRI_CX_BLOCK 47
Figure 15:
Top
Level DiagramofTrilinear Interpolation TRI_TERM_BLOCK 48Figure 16: Datapath ArchitectureofTRITERMBLOCK 49
Figure 17:
Top
Level Diagram ofICI Module 51Figure 18: Submodule DiagramofICI Module 52
Figure 19:
Top
Level DiagramofICI_CTRL Block 57Figure 20:
Top
Level Diagram ofICIADDRCALCBlock 58Figure 21: Datapath ArchitectureofICIADDRCALCBlock 58
Figure 22:
Top
Level Diagram ofICI DIS_DP Block 59Figure 23: Datapath ArchitectureofICI DISDP Block 60
Figure 24:
Top
Level DigramofICIDP Block 61Figure 25: Datapath ArchitectureofICIJDPBlock 62
Figure 26: DiagramofSystemto Compute P"
64
List
ofTables
Table 1:
Optimal
AlgorithmParameters 29Table 2: Algorithm
Accuracy
30Table 3: Algorithm Execution Time 30
Table 4: Algorithm
Complexity
30Table5: Trilinear Interpolation Module I/O Signals 42
Table6: Trilinear Interpolation ModuleSynthesisResults 50
Table 7: ICIModule I/OSignals 53
1
Introduction
One of the most crucial challenges
facing
today's printer manufacturers is to designprinters that accurately and consistently reproduce the colors in images [1]. This is a
difficulttaskbecause the transformation fromtheprinter'sinput color spaceto its output
color space isnon-linear and drifts overtime. Colorreproductionis also affected
by
thephysicalproperties ofthe media,
including
temperature,
moisture content,brightness,
andweight. These variables are constantly changing and can lead to inaccurate and
inconsistentcolorreproduction. Whenthis occurs,theprinter mustbere-calibrated. The
calibrationprocess is
typically
manual andtime-consuming
and reduces theproductivityoftheprinter.
Acolor correction system is neededto account forthese variationsin orderto accurately
andconsistentlyreproduce thecolors inprintedimages. Solutions that can automatethe
color correction process will lead to very productive printers with little "downtime".
There is an abundance of research activity aimed at
developing
efficient and effectivecolorcalibration and control systems
[6, 8,
10]. The forward andinverseprintermaps arecentraltomanyofthesesystems.
A forward printermap is apractical and accurate model ofaprinter. This forward map
takes the form of a multidimensional
look-up
table and is constructedby
performinginput-output color experiments on an actual printer. The inverse ofthis forward map,
called the inverse printer map, is needed to convert an input color specification in a
before
being
presentedto theprint engine. Iftheinverse
printermapis accurate, theprintengine will produce a color reproduction that closely matches the original
device-independent color specification.
Any
measured change in the forward printer maprequires re-computation ofthe inverse map if color accuracy and consistency is to be
maintained. A method for accurately and efficiently computing the inverse of the
forward map issought.
Section 1 ofthis paper is this briefintroduction. In Section
2,
several important topicsrelating to this project will be discussed. Section 3 describes howprinters are modeled.
Section 4presentstheproblem athand andthe
theory
behindtheproblem. Theinversionalgorithms are also discussed in this chapter. Section 5 describes the software
implementation and results. Section 6 describes the hardware implementation and
results.
Finally,
Section 7 concludes the paper with a discussion ofconclusions, futurework, and acknowledgements. ReferencesandAppendices canbe foundattheend ofthe
paper. The CD accompanying this paper includes the paper itselfas well as data
files,
2
Background Information
2. 1
The
Xerographic
Printing
Process
Imaging
systems consist of a network consisting primarily of videodisplay
monitors,scanners, and printers. Each ofthese devices havetheir owndevice-specific color space;
generally RGB for monitors and scanners and CMY for printers. These devices
communicate image data with each other using a psycho-physical based and
device-independentcolorspace, suchas LAB.
Aspecificationinthedevice-independentcolor spaceistransformedintoa
device-dependentcolor space usingamultidimensional color correctiontableorinversecolor
map. Thisinverse mapisbasedontheforwardtransformationcharacteristicsofthe
printer. Onceconvertedintothedevice-dependentcolorspace,thecolor-separatedimage
isconvertedto halftonesanddeliveredto theprintenginethatisresponsibleforthe
physical productionoftheprint [8]. Thisprocessisillustrated in Figure 1.
Color
Specification
LAB
Printer
Pre-processor
(Inverse ColorMap) CMY
Print Engine
(ForwardColorMap)
Color
-?Reproduction
[image:12.499.42.434.489.588.2]LAB
2.2
Color
Spaces
Coloris specified inathree
dimensional
space where each point inthe space describesasingle color. There are several such color spaces, such as
RGB, CMY,
and LAB. Thisproject focuses on printers;
therefore,
this research work wasdone using the CMY and
LAB color spaces described in this section. The RGB color space is also described in
order to further explain and contrast the additive and subtractive nature ofthe color
systems. Although not used
directly
in the printing process, RGB is the color systemused on the displays that show the images prior to printing and to which the color
reproductionis often compared.
2.2.1
RGB
Color Space
IntheRGB space, a color consists oftheprimarycomponentsred, green, andblue. The
RGB system is additive, meaning that varying amounts of the primaries are added
together to produce agiven color. The additive nature ofthe RGB system makes it the
color system of choice for devices that are self-illuminating light sources, such as
displays and scanners. This color space is known as a device-dependent color space
becausethesameRGB colorspecificationwillvary perceptuallyondifferent devices.
2.2.2
CMY Color Space
Another device-dependent color space isthe CMY space. A colorin this space consists
meaningthatvaryingamounts oftheprimaries are used together to absorbcertaincolors;
thosecolorsthatarereflected,ratherthanabsorbed,combineto formtheperceived color.
Colorprints are illuminated
by
an external light source, and theperceived colors aredueto the combination of reflected wavelengths.
Thus,
a printer's input color space istypically
described in CMY.Printers
typically
add a fourth ink-black
-which results in four-color CMYK. The
black ink is addedforseveral reasons. Toproduceblack using onlytheCMYprimaries,
100percent ofeachinkmustbeused. Thisisadisadvantagebecausecolorinksare more
expensive andbecauseusing 100percentofthe threeprimariesresults inaverywet print
thatis susceptibleto wrinkles andtakesa
long
time todry.Furthermore,
thecombinationoffull CMY usuallyresults inmore of amuddy brown thanblack color.
Also,
becausethe overlayofinks maynotbeexact,blackportions of a printmay have edgesthatdonot
have full CMY coverage. Ifthese reasons are not enough,
including
black in the colormix can increase the contrast of colors
thereby
increasing
a printer's color gamut.Replacing
some percentage of the three primaries is achievedby
gray componentreplacement and under color removal [7]. A color can be specified using only these
primaries; theblack component is addedlater
by
a separate and independent processingstep. In this project, we need only consider the three primaries cyan, magenta, and
2.2.3
LAB
Color Space
The LAB
(CIELab)
color system wasdeveloped
by
the Commission Internationale del'Eclairage
(CIE)
in 1976 [7]. There is adistributionof wavelengthsfor any given lightthatis calledthe spectral power
distribution.
This system is basedonthe spectral powerdistribution of colors. The CIE
defined
three color-matching functions for humans thatarebased onthe humanpsycho-physical perception of color. The integral ofthe spectral
powerdistribution is weighted
by
the three color-matching functions to provide what isknown as the tristimulus values of the color. The LAB system defines a three
dimensional color space with tristimulus values
L,
a, and b. The L component is ameasure oflightnessof a color. Thisaxis variesfrom 0
(black)
to 100 (referencewhite).The a component is a chromatic measure of red-green, and the b component is a
chromatic measure ofblue-yellow. The maximum ofa,specified only
by
+a, isonlyredwith no green, whilethe minimum ofa, specifiedonly
by
-a, is only green with no red.Similarly,
the maximum ofb,
specifiedonlyby
+b,
is only yellow withnoblue,
andtheminimum of
b,
specifiedonlyby
-b, is onlybluewith no yellow. The axes ofthis color [image:15.499.163.343.482.638.2]space are shownin Figure 2.
The LAB color space is
device-independent;
color specification using this system arerepresentative only ofthe color's spectral power distribution and is in noway relatedto
anyparticulardevice. Colors withidentical spectral power distribution are perceivedto
be exactly the same color, provided
they
are seen in the same visual environments,regardless ofthe manner in which the color is
being
displayed. This color space is theindustry
standardfordescribing
colors ina mannerthatis independentofany device.2.3
Gamut
Mapping
Printersareincapable ofreproducing everysingle colorina givencolor space. The set or
range ofcolors that aprinter can reproduce is known as theprinter's color gamut. An
input color specification canbe located either inside or outside ofthis gamut. A printer
can
immediately
reproduce a colorthatis insidethe gamut. Colors locatedoutside ofthegamut are notreproducible
by
theprinterandmustbemappedtoacolorinthegamut.This section describes the gamut mapping process that was used in this project [4].
Gamut mapping was necessary after constructing the structured input of the inverse
printermap (see Section 4). Theprocess consists oftwo steps: first
determining
whetheran input color specification (input point) is located inside or outside the gamut, then
mappingcolorsthatare outside ofthegamuttocolors insidethegamut.
The procedure to determine whether an input point is located inside or outside ofthe
printer gamut begins
by
upsampling the forward printer map P. This 133upsampledto 643 using trilinear
interpolation.
The upsampling creates a continuous 3Dsolid with a smooth
boundary
in the CMY and corresponding LAB color spaces. Theboundary
points ofthe CMY structure are assumedto correspondto theboundary
pointsoftheLAB structure. Thisis a practical and acceptable assumption formostxerographic
printers.
The inputpoint andthe
boundary
points oftheLAB structure are shifted with respect tothe centroid ofthe LAB structure. This centroidis the average of all ofthepoints inthe
gamut, approximately
L=50,
a=0, and b=0 for most xerographic printers. These pointsare then converted to spherical coordinates r, a, and 9. The conversion is given
by
thefollowing
equations:r =
J{L-LE)2+{a-aE)2+{b-bEy
a =arctan
f(b-bE)^
K{a-aE)j
9=arctan
f \
(L-LE)
J(a-aE)2+(b-bE)
2 JIn the above equations,
[LE
aEbE]
specifies the centroid ofthe LAB structure, r is thedistance fromthecentroidto theinputpoint, ais thehueangle with range360. 9isthe
angleinaplane of constant awith range 180.
The setof
boundary
pointsissearched forthosepointsthatfall into therange aAaandcone around the inputpoint. Theaverage distance ris calculated forall
boundary
pointsthat fall within this cone. Ifthedistance parameter oftheinputpoint is greaterthan this
average, then the inputpoint is consideredto beoutside ofthe gamut.
Conversely,
iftheinputpoint'sdistanceislessthan the average, itis consideredtobe insidethegamut.
Apoint outsidethegamut shouldbemappedto a pointinthegamut suchthat the original
and mapped points are as perceptually close as possible. Research has shown that the
perceptual difference between two colors is lowest when the lightness and hue ofthe
colors are equal
[5,
7].Therefore,
themapping approachused forthisproject aimed topreservethe lightness andhue ofthe originalpoint. The
boundary
ofthe LAB structurewas searchedtofindthepointthatbest satisfiedthemappingcriteriadescribedaboveand
given
by
thefollowequations:L= L'
a'
=arctan =arctan
[a]
[L a
b]
specifies the original input point and [L' a'b']
specifies the mapped point. Themapping is illustrated in Figure3.
b
Note: a-b planeforconstantLshown
(L=L')
Figure 3:Gamut
Mapping
for Constant LightnessandHue3
Device
Modeling
Device color calibration requires a model ofthe device. These models are generally
either theoretical or empirical. The advantage of theoretical models is that color
prediction can be performed with relatively few actual measurements.
However,
theseanalytical models are usually not very accurate because
they
do not adequately capturethe nonlinearities ofreal systems due in large part to external variables. Examples of
such external variables are temperature,
humidity
and the weight and type of printablemedia. These variables have a significant affect onthe colors that are reproduced. The
Neugebauerequations areaclassic example ofcolorpredictionusing atheoretical model
Polynomial regression isonetypeof empirical model. This model suffers from accuracy
problems; predictionofcolors close tothe sample points used fortheregression provide
acceptable accuracy,but fortherest ofthepoints inthegamutthereisno guarantee [9].
Themost accurate andpractical, andtherefore the
industry
standard, empirical modelis alook-up
table containing input-output pairs that characterize the device. This model isdiscussed in detail inthe
following
section.3. 1
Forward Printer
Map
A colorprinter canbe viewed as a devicethatmaps colors from aninput color spaceto
an output color space. Theprinter canbemodeled
by
alook-up
tablecontainingpointsintheinput color spaceandtheircorresponding mappingtopointsintheoutputcolorspace.
The fullmodel canberealized
by
interpolation for inputpoints not containedinthelookup table. The
look-up
table is anapproximationto the actual printerfunction;
thelargerthe
look-up
table, the better the approximation. Thislook-up
table characterizes theprinter function and is referred to the forward printer map. The forwardprinter map is
denoted P.
The forwardprinter map is constructed experimentally
by
printing colors inthe device'sinput color space on paper and measuring these color patches in the output color space
using a spectrophotometer. The disadvantage of this model is that many color
experiments must be performed in order to construct a forward map that is an accurate
especiallybecausethisprocesscurrentlynon-automated. For
example, tocovertheentire
gamut of atypical printer, a
10x10x10
entrytableresulting in 6,000parameters is needed
foracceptablemodeling accuracy
[6,
8].The input ofthe forward printermap can be structured
by
sampling the printer's inputspaceinequal stepsalongeach axis ofthesource domain. These sampledpoints become
the
"node", "grid",
or "lattice"points ofthe forwardmap. Forn levels ofdivision,
thisgives (n-1)3 cubes and
n3
lattice points. The n3
lattice points ofthe source space are
printed, and these color patches are measured to determine the output color space
specification. The correspondingvalues fromthe source anddestination spaces populate
the
look-up
table. Note that the output of P will be unstructured (or non-uniform)because the transformation from input color space to output color space is
typically
highly
nonlinear.3.2 Inverse Printer
Map
The inverseprintermap, denoted
F1
',
isconstructedfromtheforwardprintermap P. Theinverse map associates points in the output color space with points inthe device's input
color space.
Simply
swappingthedata from P isnotdesirable fortwo reasons.First,
theinverse
look-up
tablewill notbewelldefinedforcolorsnearthegamutboundary
becauseit ispossiblethatmorethanoneinputpointbemappedto the same output point.
Second,
it results in a
look-up
table with unstructured input. Interpolation with data that hasunstructured input is complex and time-consuming. On the other
hand,
ifthe input isGeometrical linear
interpolation
is preferred over other non-linear interpolationtechniques because it is more efficient and accurate. There are several geometrical
interpolations,
including
trilinear,
prism,pyramid, andtetrahedral [5]. Eachofthesevaryin terms of accuracy, extraction (or search) complexity, and interpolation (or
computation) complexity. In this thesis project, trilinear interpolation was utilized for
look-up
tableinterpolation(see Section4.1.4).Instead of simply swapping the data from
P,
algorithms are required that interpolateirregularly
sampled multi-dimensionaldata from P. These algorithmsmustbecapable ofefficiently computingthe inverse map sothat it has structuredinputpoints and so that it
is as best an approximation to the true inverse as possible. Several ofthese algorithms
arediscussedinthenextsection.
4
Theory
Consider the forwardprintermap Pthatmaps apoint vintheprinter's inputcolorspace
to a pointzintheoutput color space. Alsoconsidertheinverseprintermap P1
thatmaps
a pointxinthe targetcolor spaceto apoint y intheprinter'sinputcolor space. Given a
point x in the target color space, the inverse color map is used to find a point v in the
printer's inputcolor spacethat theprinter willmap to a point zinthe output color space.
-? z
Figure4: SystemofP1andP
Thedesiredresultis fortheprintertoprint a colorz
that,
whenmeasured, willbeas closeas possibletothe requested colorx. The difference betweenthe target andoutput colors
can be quantified
by
computingtheEuclidean distance betweenthem. This difference isknownas
AE(P(y),x)
orsimply AEandisgivenby
Equation 1.AE=
\\z-x\\
=\\P{y)-x
(1)
Notethat
z=P(y)
In general, and specifically for this project, the target and output color space is the
device-independent LAB color space. For thetarget color
(LAB)jn
andthe output color(LAB)out,
Equation 1 becomes:A=J{Loul-LJ2+{Aoul-
Am
f
+{Boul
Thesmallerthe valueofAE fora given
input,
thebettertheinversemap. Humansdonotperceive color differences less than 1.0. AE = 1.0 is
know as the "just noticeable color
difference."
Thisleadsto the
following
optimization problem:Given atarget colorx andthe forwardprintermap
P,
find theprinter inputythatsolves
minAE{P{y),x)
(2)
y
Solving
this optimization problem results in the inverseprinter map P'1. The structured
inverseprinter
look-up
table isconstructed accordingto themethodologyoutlined in[1],
re-statedbelow:
1. Obtainthe estimated forwardprintermap,
P,
ofa given color printer. This isachieved
by
equally sampling theprinter input spaceY
andthen theprintedcolor Zj for each grid node y is obtained from experiments on the actual
printer. In addition, some interpolation technique is used to estimate the
outputscorresponding toinputpointsthatarenotgridnodes.
2. Gridthe target color spaceXXo obtain an ordered collection of vectors Xj, i =
1,
2,
..., h (each x{ corresponds to a grid node). Thiswill result in gridnodes
that are outside ofthe printergamut; thesepoints must be firstmapped onto
thegamutusingthe approachdescribed in Section 2.3.
3. Obtainthe yf thatsolvesEquation2 forXj and
P,
fori=4. Select a mutidimensional
interpolation
technique to obtain the inverse mapoutputs ycorrespondingto inputpointsxthat arenotexactlygridnodes.
Several algorithms have been proposed for solving the optimization problem given
by
Equation 2. These algorithms can vary widely interms ofaccuracy,numerical stability,
speed, and complexity. Three of these algorithms are Shepard's Interpolation
[3],
Moving
Matrix[2],
andIteratively
Clustered Interpolation [1]. These algorithms werethe subject of study in this thesis. The details of each algorithm are discussed in the
following
sections. Trilinear interpolation is usedfrequently by
the ICI algorithm tointerpolatethroughPand
F1;
it is discussedattheend ofthis section.4.1
Algorithms
The algorithms will be presented in an abstract sense, where x represents colors in the
targetcolor space, yrepresents colors in theprinter's input colorspace, andzrepresents
colorsintheprinter's output color space.
Forthealgorithms discussed inthis section,recall thathe forwardprintermap Pcontains
Nnumber of entries anddefinesthemapping function:
For the forward printer map used in this project, the printer's input color space was
CMY,
anditsoutput color space wasLAB. Thealgorithms canbeappliedto this specificapplication
by
rememberingthat x, y, andzare equivalentto thecolorspacevectors:x =
[LAB]
y =
[CMYj
z=[LAB]
4.1.1
Shepard's Algorithm
Shepard's algorithm is based on the work ofDonald Shepard and is a well-established
method for
interpolating
scattered data [3]. It is essentially an application of weightedaveraging. The value for a giveninput point is calculated
by
a weighted average of allotherdatapoints, where the weighting isa function ofthe distance from the givenpoint
to theotherdatapoints.
Givena color xin the target color space, Shepard's algorithm computes the inversej?
by
the
following
equation:y
ifdj
^ 0 for all z }y.
if
dj
=0 for someZj
where
dj
is theLp
normbetweentheinputpointx and z/.There aretwo variable parameters inthis algorithm:p andju. The p parameter specifies
howthe distances betweenpoints are calculated. The /uparameter affects the
locality
oftheweighting
function;
largevalues of /u resultinmore localbehavior,
which means thatonlythosepoints closestto theinputpoint willbesignificant.
4.1.2
Moving
Matrix Algorithm
The
Moving
Matrix algorithm computes theprinter inverse using linearweightedleast-squares regression [2]. The inverse is given
by
thefollowing
equation, whereA is thetransformationmatrix.
y =
xAT
(3)
ThetransformationmatrixA is found
by
minimizing theweightedsquared error givenby
the
following
equation:E=z;=I^h-^ir
(4)
Differentiating
Equation 4 with respect toA,
setting it equal to zero, and solving for Ayieldstheclosedformsolution:
A Ut^I
where
.=IJ'>jk4
mdThe weighting is a function ofthe distances from the input point x to all ofthe other
pointsZjcontainedintheforwardprinter map:
1
df+e
Here
dj
istheEuclideandistance fromtheinputpointxto thepoint Zj.The variable parameters of this algorithm are /u and s. These parameters affect the
locality
ofthe regression. Large values of/u and small values ofs giveW)
more localbehavior,
whichmeansthatonlythosepoints closestto theinputpoint willbesignificant.4.1.3
Iteratively
Clustered Interpolation Algorithm
The
Iteratively
Clustered Interpolation(ICI)
algorithm is a gradient-based optimizationmethodthatusesaniterativetechniquetogenerate initialpoints [1].
An unconstrained gradient-based optimization algorithm to solve Equation 2 would be
given
by
[11]:y{k+
\)
=y{k)-f3rdAE{y,x)2^
dy
k>
0,
fora giveny(0)(5)
Notethat
AE(y,x)
isEquation1,
restatedbelow:A=
[z-x|
=Thus,
A(
y,x)2 =\\P{y)
-xf
=(P{y)
-xj
(P{y)
-x)Thegradientof
AE(y,x)2
with respectto yforafixedxisthen:
dAE{y,x)2
dy
=2J{y)'{P{y)-x)
(6)
Substituting
Equation 6 into Equation 5 givesthefollowing
update equation:y{k+
\)
=y{k)-vJk<{P{y{k))-x)
(7)
where /u=2fi.
P{y{k))
-x is a component vector ofthe differences between thetargetcolor x and the output color z produced
by interpolating
y(k) through the forwardprintermap P:
P{y{k))-X=
Zk,0 XQ
Zk,l -X,
Jk.2 -x2_
Jk
istheJacobianof(P(y(k))
-x) evaluatedatv =
y(k), andisgivenby:
Jk
=dz1
dz2
dz3
dyx
dzx
dy,
dz2
dy,
dz3
dy2
dz]
dy2
dz2
dy2
dz3
This matrix is also known as the gradient matrix. It is calculated using numerical
differentiation of the forward printer map P. dz is an average of the forward and
backward finite differences that result from separately varying each component ofv and
interpolating
throughP.Gradient-based optimization methods suffer from drawback that a solution may be
optimalonly inalocal sense,ratherthan
being
theglobaloptimum. Inorderto avoidthissituation, theinitial estimation oftheinverse mustbe closeto the actual globallyoptimal
solution. A novel and efficient procedure to determine a
"good"
initial estimate is
presentedin
[1]
andisre-statedasfollows:1. Searchthe
z,-points ofPfortheonethatis closestto the giventargetpoint x. Call
thispoint zaux.
2. Find the correspondingpointyaux. This is thepoint suchthatzaux =
P(yauJ-Yaux
is anode point inthegridofthe Yspace,whichis apointintheinputspace ofthe
forwardprintermap P. Thisyauxis acourse estimatefortheinverseof x.
3. Select Mpoints inthe neighborhoodofyaux; generate a cluster ofMpoints
(yauxl,
yaux2, , yauxM)
by
moving along axes aroundyaux.
Map
these Mpoints to obtainZauxj=
P(yauxj),
j
=1,
2,
...,M.4. Findthe closest ofthe zauxjpoints to the inputx and call this point z0. Take the
P(yo)-After usingthis procedure to find the initial estimate
P(0),
the update equation givenby
Equation 7 can be used. Define parameters
kmax
and sthat are the stop criteria oftheiterationoftheupdate equation.
s specifiestheerrorthresholdat whichtheupdateiterationscanstop, suchthat
|P(y(*))-x|<*
(8)
kmax
is the maximum number ofiterations that algorithm will perform when computingthe inverse for a given point. It is useful to define this parameter as a stop criteria
becausenot every input pointwill result in an errorthatis belowthe threshold s. When
either Equation 8 is satisfied or k >
kmax,
the algorithm stops andy(k) is taken as thesolution.
The parameter 8 for this algorithm specifies the perturbation used in the numerical
differentiationmethod usedto computethegradient matrix.
The parameter ju should be selected in order to achieve fast convergence and to meet
accuracyrequirements. Thisparameter shouldbebounded
by
thefollowing
equation, asdiscussed in [1]:
2 0<ju<
4.1.4 Trilinear
Interpolation
Trilinear interpolation of a
look-up
table is a three-dimensional geometric method forcomputing the output values ofinputpoints that are not containedin the
look-up
tables[5]. For the sake ofthis
discussion,
thelook-up
table is either the forward or inverseprinter map, andthisinterpolation technique isused withthesemaps forinputpoints that
are not grid points.
Trilinearinterpolationconsistsoftwosteps:theextraction stepandtheinterpolation step.
In theextractionstep, the sub-cubeinthe source color spacethatcontains theinput point
is determined
by
a series of comparisons. The eight vertices ofthis cube are the latticepoints inthe source space, as shownin Figure 5.
Pon(xo,yi, zO
Pooi(xo!yo,z0
Poio(xo,Vi,ZoLfc
Pooo(xOryo,Zo)
Pm(xi,yi,Zi)
y<>,zO
Pno(xi,yi,zo)
[image:32.499.103.365.396.539.2]Pioo(xi>yo,Zq)
Figure 5: Sub-cubewithLattice Points DeterminedDuringExtraction
The interpolation step consists of the repeated use of one-dimensional linear
Referring
to Figure6,
a pointp on the curve between lattice pointspo andpi is to beinterpolated.
[image:33.499.136.332.98.222.2]xO x
Figure 6: ID Linear Interpolation
The interpolated value, pc(x), is
linearly
proportional to the ratio (x-xo)/(xi-xo).Therefore,
Pc(x)=
P(X0)+
[p{*i)
-P{*o)]
Oi-*o)
Three-dimensional trilinear interpolation consists of seven linear interpolations on the
sub-cubedepictedin Figure 5. Thisresultsinthe
following
equations [5]:Ax =x
-xo
Ay=y-y0
Az= z-zo Co =Pooo ci =(pioo
-pooo)/(xi
-x0)
C2 =
(poio
-pooo)/(yi
-yo)C3 =
(P001
-POOO)/
(Zl
-Zo)
c4=
(pi
io-poio -pioo +Pooo)/[(xi
-x0)
(yi
-yo)]C5 =
(Pioi
-pooi-Pioo
+P000)
/i(xi
-xo)(z\-zo)]
C6 =
(pon
-Pool -Poio+P000) /[(yi
-yo)(zi-z0)]
cj=
(pm
-pon -pioi -pi10 +Pioo +Pooi +Poio-Pooo)/
[(xi
-x0)
(yi
-yo)(zi
-z0)]
p(x,y,z) =c0 +cjAx +c2Ay +c3Az +c4AxAy +
5
Software
Implementation
Softwareprograms
implementing
theShepards, Moving
Matrix,
and ICIalgorithms werewritten inthe C programming language. Theprograms readdata files that contained the
forward printer map and the gamut-mapped input points of the inverse map. The
programs executedthe algorithms onthis datato computetheinverse forthe giveninput
points; this output datawas written to separate data files for lateranalysis. The forward
printer map was represented
by
two data files: one had 133CMY entries, and the other
had the corresponding 133 LAB entries. This experimental printer data was obtained
from Dr. L. K. Mestha from Xerox.
5. 1
Algorithm Metrics
These software implementations and simulations provided relative accuracy, execution
time, andcomputationalcomplexityinformation foreach algorithm.
5.1.1
Determining
AlgorithmComplexity
The complexity ofthe algorithms was measured objectively
by
including
code to countthe number ofmultiplication/division and square root operations performed to compute
the inverse. These operations are costly in hardware because the modules to perform
themareboth large and slow ascomparedtoother operations. The algorithms were also
arithmetic modules, the amount of parallelism that might be exploited, and the
complexityofthe required control logic. A combination ofthe objective and subjective
complexitywas usedto rate each algorithm's overallcomplexityrelativeto the others.
5.1.2
Determining
Algorithm
Execution Speed
The csh command interpreterrunning ontheUnix platformhas abuilt-in utilitythat can
be used to determine the amount oftime a program is executing on the system. The
executiontimes ofthe programs were measuredusingthis utility, whichis an executable
named time. In order to form a valid comparison, the programs were executed on the
same machineunderthe same environment. Program developmentand executionwas on
Rochester Institute ofTechnology's Grace computing system. Grace is an Alpha 4100
5/533 with3 EV5.6 CPUs. It has 1.5GB ofmemoryandis running Tru64
UNIX,
version4.0F.
5.1.3
Determining
AlgorithmAccuracy
The programs computed the inverse forthe 133
gamut-mapped LAB points x using the
forward printer map data. The completion of the programs resulted in a data file
containing
133
CMYentriesthatdefinedtheinverse y:
y =
Eachpoint
5>;
wasinterpolated
throughthe forwardprintermap Pto findtheresultingzj.The differences
AEj
between zy and the corresponding xj such that jk=F'(xj)
werecomputed. Thiscanbeshowngraphicallyas:
"j
*,.
>yj
>ZJ
AEj
'
Themean error and standard
deviation,
theminimumerror, andthemaximum error werecalculated from the
AEj
-these statistics representedthe accuracyofthe algorithms. It
should be noted that these error statistics were computed using the node points ofthe
inverse map as inputs. The accuracymetrics of each algorithmwere usedto comparethe
algorithms in terms of accuracy. Separate programs were written to perform the
interpolationthroughP and to compute the accuracymetrics; these are also included on
theCD accompanyingthispaper.
5.2
Optimal Parameters
Each algorithm was executed several times while varying the algorithm parameters.
Accuracy
metrics for each trial were measured to gain insight into how the parametersaffectthe accuracy. Thesetrials also ledto the values oftheparametersthatresultinthe
best accuracy. Tables showing the accuracydata for differentparametervalues foreach
algorithm are shown in the Appendix (see Section 9.1). The results ofthese trials are
5.2.1
Shepard's Algorithm
ForShepard's algorithm, there arethetwo variableparameters/* andju. The/?parameter
specifies how the distances between points are calculated. The trials showed that the
value ofp does not have much affect on the accuracy.
Therefore,
p = 2was taken
because this isthemostconvenient as itresults inadistance measurementthat issimply
the Euclidean distance. The /j. parameter affects the
locality
ofthe weighting function.The simulation results showthatju=5
providesthebestaccuracy.
5.2.2
Moving
Matrix Algorithm
The variableparameters inthe
Moving
Matrix algorithm arejuand e. Similarto thesameparameter in Shepard's algorithm, the // parameter affects the
locality
ofthe regression.Simulations showed that ju= 5
resulted in thebest accuracy. The eparameter does not
have a significant affect on the accuracy and is taken as a small value to avoid
ill-conditioningofthe
Si
matrix. Inthese simulations, fwastakenas10" .
5.2.3 ICI Algorithm
The ICI algorithm has parameters e,
kmaXt
5,
and /j.. s andkmax
are the stop criteria; sspecifies the error threshold at which the update iterations can stop, and
kmax
is themaximum number ofiterations that algorithm will perform when computing the inverse
for a given point. Theparameter 8 for this algorithm specifies the perturbation used in
fu should be selected in order to achieve fast convergence and to meet accuracy
requirements.
The software simulations showedthat themean errorforthisalgorithm canbereduced
by
decreasedthe errorthreshold s. Inorderto achieve fast convergence, swastakenas0.5.
This is belowthe"justnoticeabledifference"ofAE= 1.0. Other
simulationshave shown
that even better accuracy can be obtained
by
further reducing s [4]. The softwaresimulations also showed that the best choice of
kmax
is 50.Increasing
this parameterbeyond 50 does not result in substantial gains in accuracy; ifthe algorithm is going to
converge for a given input point, it will do so within 50 iterations. The perturbation 5
was determined to not have any measurable affect on the accuracy ofthe algorithm; 5
was taken to be
10,
which is approximately a 4% variance ofthe printer's input colorcomponents
(
0<C, M,
Y<255 and10/255=0.04)
andis more orless arbitrary.Finally,
thesimulations showed/j.=4tobe
[image:38.499.135.347.553.673.2]a good choiceintermsofaccuracyand convergence.
Table 1 showstheoptimal parameters foundforeach algorithm.
Table 1: Optimal Algorithm Parameters
Shepard's P=2
/j= 5
Moving
Matrix=10"4
ICI 5=0.5
5=10
5.3
Simulation
Results
Theaccuracy, speed, andcomplexityofeach algorithmis summarizedin Tables
2, 3,
and4,
respectively. The metrics are obtained from the simulations using optimal algorithmparameters and operating on the same data and on the same machine in the same
environment. All measurements are made forthe algorithms computing the 133 inverse
map fromthe
133
[image:39.499.79.419.250.326.2]forwardmap.
Table 2: AlgorithmAccuracy
Mean
AE
Std
Dev
Mean+
2*StdDev
Min Max
Shepard's 1.64 0.54 2.73 0.15 3.81
Moving
Matrix 1.28 2.29 5.86 0.02 54.88ICI 0.39 0.12 0.63 0.05 1.68
Table 3: Algorithm Execution Time
ExecutionTime
(sec)
Shepard's 12
Moving
Matrix 17ICI 7
Table4: AlgorithmComplexity
Shepard's LOW
Moving
Matrix HIGHICI MEDIUM
ICIisclearlythe winnerinterms ofaccuracyand execution speed. Abrief discussion of
each algorithm'scomplexityfollows.
Moving
Matrix has significantlymore multiplications than Shepard's and approximatelyslow design.
Moving
Matrixalso would require complex control logictoperform matrixoperations such as matrixinversion.
Shepard's algorithm would be quite easy to implement because it requires a small
number ofresources and is data flow oriented thus requiring very little control logic.
However,
therearealargenumber of multiplications(threetimes thatofICI),
which willresult in a slow design. Even if more resources were used to perform some ofthe
processing inparallel (whichwouldincrease the design area), it is still unlikely itwould
beatout ICIinterms of speed. Although easyto
implement,
Shepard's is inaccurateandwould most
likely
be slowerin hardwarethanICI.ICI has the smallest number of multiplications, which could result in a less complex
circuit.
However,
ICI has two significant drawbacks.First,
this algorithm requirestrilinear
interpolation,
whichitselfisa non-trivialhardware implementation.Second,
ICIwill require control logic for computing thegradientmatrix and for computingtheinitial
estimate
by
theclusteringtechnique.Of the three algorithms, ICI is the best in terms of speed, accuracy, and hardware
complexity. This algorithm was chosen for hardware implementation in VHDL. The
6
ICI
Hardware
Implementation
NOTE: All source VHDL files are located on the accompanying CD under the
"Hardware/VHDL Source Code"
folder. The area and
timing
reports for eachsub-module described in this section can also be found on the accompanying CD under the
"Hardware/Reports"
folder. VHDL source code is also shown in the Appendix (see
Section 9.2).
6. 1
Design
Methodology
The algorithm was implemented using VHDL. The design was simulated with Mentor
Graphic's
ModelSim;
the simulation results were compared with results obtained fromsoftware models operating on the same input data. Correct
functionality
ofthe designwas guaranteed since the hardware simulation results matched those of the software
models.
The hardware modules were synthesized using Synopsys's Design Compiler and was
targeted to a generic
library
providedwith the Synopsys'stools. The resultingsynthesisresultsprovideanet-listthatcanbeusedfor ASIC fabrication.
Several important design decisions were made about the system partitioning and
architecture, data storage, and datarepresentation. These decisions will be described in
6.1.1
System
Partitioning
andArchitecture
The ICI algorithm makes extensive use oftrilinearinterpolation. Trilinear interpolation
is a geometric method to calculate the output of a 3D function at a given input point,
where the 3D function is defined
by
a finite set ofinput and output pairs. A detaileddiscussion oftrilinear
interpolation
can be found in Section 4.1.4. An entirely separatemodule was designed to perform trilinear
interpolation;
this module is usedby
the ICIdesign and can alsobeusedinother applications.
Both the trilinear interpolation module and ICI module were designed using datapath
blocks and control blocks. This approach allows for
highly
controlled use of thearithmetic units that make up the datapaths. The advantage of
having
such control overthese units is that operations can be pipelined and the units can be shared where
appropriate.
6.1.2 Datastorage
Therearesix valuesfor eachentry intheforwardprintermap
P;
oneforeach oftheinputspace components andone foreach ofthe output space components. For example, ifthe
P defines amapping from CMY to
LAB,
then there is a value forC, M,
and Y for theinputcolor space, and avalue for
L, A,
andB forthe output colorspace. AnN3
look-up
table requires
6N3
bytes of storage. For
N=13,
this results in 13,182 bytes of requiredSynthesis tools are still
incredibly
inefficient
at synthesizing memory; therefore, thisdesign assumes that an external memory module exists that contains P. The memory
module in Figure 7 is assumed for designpurposes. The interpolation and ICI modules
aredesignedtointerfacewiththismemorymodule.
[image:43.499.89.420.173.270.2]C>data out
Figure 7:
Memory
Module Diagram6.1.2.1 Operation
The memory module is synchronous and clocked
by
the elkinput signal; all reads andwrites occur onthe risingedge of elk and completeinone clock cycle. The operation of
themoduleisdescribed below.
(a)
Read: Readsoccur when wrislow;
data_outbecomesthevalue stored atthelocationspecified
by
addr.(b)
Write: Writes occur when wrishigh;
thelocationspecifiedby
addrisstored(c) Memory
location: addr specifies the memory location to read fromduring
reads and the location to write to
during
writes. Its value can range from zero to6N3-1.
The
interpolation
and ICImodulesthatinterface
to thismemorynever writetoit;
thewrand data_in signals are
included
forcompleteness.6.1.2.2
Memory
Data Layout
It is assumed that thememory module contains the fowardprinter
look-up
tableprior tousing the
functionality
oftheinterpolation and ICI modules.Furthermore,
it is assumedthat thevalues are arrangedinthememorymoduleinthe
following
fashion:Memory
Location Contents0
Co
1
M0
2
Y0
3
Lo
4
A0
5
B0
6
Ci
7
Mi
8
Y,
9
Li
10
A,
6.1.3
Data Representation
The project specification requiredthat twodecimal places ofaccuracy beretainedforthe
LAB values. This meant that the design would have to implement
floating
pointoperations.
Floating
pointhardware ismore complex and slowercomparedto fixedpointhardware.
Additionally,
the design resources did not have floating-point libraries thatcontainpre-compiled and optimizedfloating-pointmodules. These librarieswould either
have to be purchased, or a significant amount oftime and effort would be required to
designcustomfloating-pointarithmetic modules.
An alternative is to use fixed point arithmetic and scale all ofthe data to integers. The
approach allows for the use ofpre-compiled and optimized fixed point integermodules,
whichsimplifiesthedesignand reduces designtime.
To preserve two decimal places, the data had to be scaled
by
a factor of 100. Forexample, 123.45 would be scaled as 123.45*100 = 12345. Integer
addition and
subtraction would occur normally.
However,
multiplications would require theresulttobescaleddown. Thiscanbeshown
by
thefollowing
equations:Original Operation: A* B=AB
With Scaled Data:
(
1 00A)
*(
100B)
= 1 0000ABHowever,
we onlywant100AB,
which meansthisresultmustbe scaleddownby
afactorscaled fixed point arithmetic; multiplication operations must be followed
by
division.This disadvantage is a fair trade-off compared to the complexity involved in
implementing
adesigntosupportfloating-point
numbers.Itwasnecessaryto determine ifusingthis
integer-based
approach would compromisetheaccuracyofthe ICI algorithm. The ICI C language implementation files weremodified
touse onlyinteger datainordertoinvestigatetheissue. The integerbased ICI Ccode is
located on the accompanying CD under the "Software/integer-based ICI"
folder.
Simulation results showed that the accuracy of the integer-based implementation was
equalto its
floating-point-based
counterpart.6.2
Synthesis
Methodology
The control blocks of the trilinear interpolation and ICI modules are the only
synchronous
(clocked)
blocks ineach ofthe designs. Theseblocks weresynthesized forspeed, and the maximum path
delay
through them defined the clock and maximumoperating
frequency
for the entire design. All otherblocks are asynchronous, and theirmaximum path delays are either shorter or longer than the defined clocked. For those
blocks that have a shorter maximum path
delay,
there is no concern because the circuitwill provide valid results within the clock period. For those blocks that have a longer
maximum path
delay,
thecontrolblocksmust either wait theappropriate amount of clockcycles for the resultsto become valid or continueon with processing. In the latter case,
computation inthe block would continue as the control block would proceed with other
the logic flow. This concept of parallel computation motivated the design of several
separate datapaths that perform different functions. Wherever this type ofparallelism
could not be exploited, the control block simply waits the appropriate amount ofclock
cycles forvaliddata.
There were no design specifications, such as area or speed, provided
by
the projectrequirements for the hardware implementation.
However,
the design has application inreal-time and near real-time applications. Itwas also designed for ASIC implementation
inmind. For these reasons, the designblocks were synthesized forthe fastest possible
performance. This is a reasonable synthesis constraint since the goal is real-time
computation, and the ASIC platform allows for larger and more complex designs as
comparedtoprogrammable
logic,
such as fieldprogrammable gate arrays(FPGA).Theasynchronousblockswere constrained
by
specifying inputand outputdelays equaltothe defined clock. These constraints forced the synthesis tool to generate the fastest
possible
design,
but one that would never meet thetiming
constraints because ofthecomplexityofthe operations
being
performed. Several ofthe synthesizedblocks'timing
reports clearlyshowthat
they
donot meettiming
constraints. This is acceptablebecauseinthis designmethodologyit is understoodthat theseblocks requiremorethanone clock
6.3
Simulation &
Testing
The VHDL code was simulated using Mentor Graphic's ModelSim tool. Testbenches
were written in order to test and verify correct
functionality
of the hardware design.These testbenches read input data from
files,
fed itto the module under test, and wrotethe output data into another data file. Other inputs were also controlled
by
thetestbenches;
theseinputs,
such as algorithm parameters, were the same as those used inthe C language software simulations. The output data files written
by
the testbencheswere compared to the output data files written
by
the C software models. Correctfunctionality
of the VHDL design was guaranteed because the data contained in theoutput files matched. The VHDL source code for the testbenches can be found in the
6.4
Hardware Modules
[image:49.499.99.410.223.349.2]6.4.1 Trilinear Interpolation Module
(trijop.vhd)
Figure 8 shows the I/O signals ofthe trilinear interpolationmodule, andFigure 9 shows
theblock diagram ofthemodule. Inputand outputsignals aredescribed in Table 5.
I -8
O **t
5fAR'f
'H3t'8
O-"<? i.,erst.
r ADS?
-O
a;.
c
-f**>2atj: as
-Ozra: 2
.TERfc.
[image:50.499.48.450.32.657.2]Table5:Trilinear InterpolationModule I/O Signals
Signal
Name
I/O Type Description
Clk Input stdlogic Systemclock.
Memin Input std_logic vector[31:0] Input datafromexternalmemory.
Start Input std_logic Interpolation operationbeginsonthe
rising
edge ofthissignal.
XO Input std_logic_vector[31 :0] Specifies the first component of the input
point.
XI Input std_logic_vector[31 :0] Specifics the second component of the
inputpoint.
X2 Input std_logic_vector[31 :0] Specifies the third component ofthe input
point.
Done Output stdlogic Signals the completion ofthe interpolation
operation andvalidoutput.
Error Output std_logic Signals an error occurred
during
theinterpolationoperation. Note: Anerror can
occurs ifthe input point specified
by
XO,
XI,
and X2 is located outside ofthe inputspacedefined intheLUT.
Memaddr Output std_logic_vector[31 :0] Specifiesthelocationofthedesired data in
memory.
ZO Output std_logic_vector[31 :0] First component ofthe output value; valid
whendone=l and error=0.
Zl Output std_logic_vector[31:0] Second component of the output value;
valid whendone=l and error=0.
Z2 Output std_logic_vector[31:0] Thirdcomponent oftheoutput value; valid
whendone=l and error=0.
6.4.1.1
Operation
The system clockmusthave aperiod of40nanoseconds orgreater; this correspondsto a
maximum operating
frequency
of25 MHz. Simulationresults show that this design can [image:51.499.28.469.85.467.2]The
rising
edge of startbeginsinterpolation
attheinputpoint specifiedby
signalsXO,
XIand X2. Done is asserted when
interpolation
is complete. Iferror is not raised, validoutput occurs on signals
ZO,
Zl and Z2. Memin and memaddr are interface signals tothe external memorymodule. Memin is data comingfromthe memory; memaddris the
location in memory of desired data and is controlled
by
the interpolation module'scontrolblock.
6.4.1.2
Sub-blocks
6.4.1.2.1
TRI_CTRL_BLOCK
(trictrl.vhd)
This blockcontains the control logic neededto carryout the interpolation. Its input and
output signals interface to the other sub-blocks and to the external memory module to
control data flow. It is also responsible for
beginning
the interpolation when start israisedandfor asserting done and errorwhentheinterpolation is complete.
The
top
level diagram is shown in Figure 10. A lower level block diagram is notavailable because it contains too many low-level components
(primarily
registers) andADDl.^KS.uLTtj'.*21 c^O-fCHlNI3iai
sta*-
rj>-SUB1.RESULT131i!3 SuB2.,.RE5Ui.7i3la
?>jy:i..RESULT'3! JZ|
T. *E.SU-7t3!81 x*r3!;?i x.H3l>81 X2C21:8. OADDlAt:ji .a) OAHJ1Bl3l=ai -ODO -t>&SOS? 4J*^Su8lAt31tai OsuBib;^: 31 -^:>wH2A;21 -21 Os-^;'::i:-*8J f3>s.u:i:tA:a: a;
^g^P*&Jl&4:' . '&>.
f^T*I-MXnJB>-334l(31 z: HD^'^i A.j.^ ;>>_BC!3l:SI
OtI.A0aR_SP_IN3131>B!
PTgi-ADaa jy-m!3i-0! PTg X-AICT JP 3>ft131Bi
O"^X-ADOR-DP.JCNB1 31m
"OTKI..ADLK_0>*_I>i/*3i m ">Uli3! <|]
O
"--CJI3I:J -5*T_C2:31'81
4S>T-X35 3!:ii!;
<0*..r^!31'Bi
f^1"-CSi3i-a.
4Q>Tj:8f3t
k-|">T.X?!3! :2;
f^"..;;xf3l ;!
f^TJjriSI =81
>0"_D2t3I^l
-0/;i3t <bi
[image:53.499.70.430.42.435.2]0/tni:^. n -gi
Figure 10:
Top
Level DiagramofTrilinear Interpolation TRI CTRLBlockThe area ofthisblock is 18,148 cells. The maximum path
delay
through this block is6.4.1.2.2
TRIADDR BLOCK
(triaddrblock.vhd)
This block defines the datapath forcalculating thememory address of adesiredvalue in
memory. The dataneeded frommemory arethe lattice points ofthe cell containing the
inputpoint andtheircorrespondingpointsinthe output. These latticepoints are located
by
indices which were determinedby
searching the input space forthe appropriate cellcontaining the input point. These indices are input to this block which computes the
memoryaddressofthedesiredvalues.
The
top
level diagram showing input and output signals ofthis block can be found inFigure 1 1. Alowlevel block diagramisshownin Figure 12.
INZt3l"81
D*-!N3!3I'0)
O-33N5131-81 O"
1NBI3I :'B'CJ>"
"C^r 131-01
T'" *>
[image:54.499.141.362.329.480.2]Figure 12: Datapath ArchitectureofTRI ADDRJBLOCK
The area ofthis block is 20,014 cells. The maximum path
delay
through this block is38.87nanoseconds.
6.4.1.2.3 TRI CX
BLOCK(tri_cx_block.vhd)
This block defines the datapath for calculating the CI through C7 values (henceforth
knownas CX values) usedto calculate the output. These values are calculatedfromthe
isneededto compute all seven valuesbecausethese units are sharedduetopipeliningof
operations
by
thecontrolblock.The
top
level diagram showing input and output signals ofthis block can be found inFigure 13. Alow level block
diagram
isshownin Figure 14.Ar;31A[31-'B3
a::::ibi2: -ei
f^>-5l.;B1Ani:g)
SUB2A[3! <gj
sugzB'3* -b'i
&J33AI'J 1 :ii
St63Bl3t s^J
A:.:-Dl_??rSULTl31rgI
5u3! ??fcSJLTtjt=81
SUB2_RESULT{31 ^81
-OstS3_SESULTt31-2!
[image:56.499.117.380.206.343.2]TI_CX_3_
[image:56.499.36.461.421.631.2]Figure 13:
Top
Level DiagramofTrilinear Interpolation TRI CX BLOCKThe area ofthis block is 3,658 cells. The maximum path
delay
through this block is10.36nanoseconds.
6.4.1.2.4 TRI TERM BLOCK
(tritermblock.vhd)
Thisblockdefinesthe datapathfor calculatingtheinterpolatedvalues; it istheworkhorse
ofthe interpolation module. It uses the CX values produced
by
the TRICXBLOCKand several otherinputsprovided
by
thecontrolblocktocomputetheinterpolatedvalues.The
top
level diagram showing input and output signals ofthis block can be found inFigure 15. A low level block diagram isshowninFigure 16. Thedisadvantageofusing
scaleddata canbe seen in this low level
diagram;
there are divide units throughout thedatapath that are needed to scale down theresults ofthe multiplication-rich calculation
performed
by
this block. These divides units add significant area and delays to th