• No results found

Analysis and hardware implementation of color map inversion algorithms

N/A
N/A
Protected

Academic year: 2019

Share "Analysis and hardware implementation of color map inversion algorithms"

Copied!
116
0
0

Loading.... (view fulltext now)

Full text

(1)

Rochester Institute of Technology

RIT Scholar Works

Theses

Thesis/Dissertation Collections

8-1-2002

Analysis and hardware implementation of color

map inversion algorithms

Michael Martin

Follow this and additional works at:

http://scholarworks.rit.edu/theses

This Thesis is brought to you for free and open access by the Thesis/Dissertation Collections at RIT Scholar Works. It has been accepted for inclusion in Theses by an authorized administrator of RIT Scholar Works. For more information, please [email protected].

Recommended Citation

(2)

Analysis and Hardware Implementation of

Color Map Inversion Algorithms

by

Michael W. Martin

A Thesis Submitted in Partial Fulfillment

of the Requirements for the Degree of

MASTER OF SCIENCE

Computer Engineering

Approved By:

Principal Advisor

Kenneth Hsu, Professor, Computer Engineering

Committee Member

-Soheil Dianat, Professor, Electrical Engineering

Committee Member

_

Athimoottil Mathew, Professor, Electrical Engineering

Department Head

Andreas Savakis, Associate Professor and Department Head, Computer Engineering

Department of Computer Engineering

College of Engineering

Rochester Institute of Technology

Rochester, New York

(3)

REPRODUCTION PERMISSION STATEMENT

PERMISSION DENIED

Analysis and Hardware Implementation of

Color Map Inversion Algorithms

I, Michael W. Martin, hereby deny permission to any individual or organization to

reproduce this thesis in whole or in part.

Michael W. Martin

(4)

Abstract

Thepurposeofthis thesis isto investigateseveral algorithmsthatare usedto computethe

inverse of a forward printer map. The forward printer map models the printer

by

mapping points in the printer's input color space to points in the printer's output color

space. The inverseofthis forwardmap isrequiredto convertinputcolor specificationsin

adevice-independentcolor spaceto a colorin the printer's device-dependentcolor space

before

being

presented to the print engine. The accuracy of the inverse printer map

directly

affects the accuracy ofthe reproduced colors.

Therefore,

anymeasured change

in the forward printer map requires re-computation ofthe inverse map if accurate and

consistent color reproduction is to be maintained. An efficient and accurate method of

computingtheinverse map couldbeusedinan automaticcolor correction system.

Three algorithms for computingtheinverse ofthe forwardprintermap arestudiedinthis

thesis project. These are the

Shepard's,

Moving

Matrix,

and

Iteratively

Clustered

Interpolation

(ICI)

algorithms. The algorithms are implemented in C and simulated in

ordertobenchmarktheirrelativeaccuracy, speed, and complexity. The simulations show

the ICI algorithmto be the fastest and most accurate at computing the inverse map, and

its complexity does not far exceed that ofthe other algorithms. The ICI algorithm was

implemented in VHDL and synthesized to a Synopsys generic

library

in order to

determine the approximate size and speed of an ASIC that could perform the inverse

computation. The finalimplementationresultedintwomodules: onethatimplementsthe

ICI algorithm, andonethat implementsthetrilinearinterpolation functionthat isused

by

(5)

synthesized trilinear interpolation module contained 190,357 cells. The

timing

ofthe

modules resulted in a 40 nanosecond clock period, which corresponds to a maximum

operating

frequency

of25 MHz. These synthesized results show that this algorithm is
(6)

Table

of

Contents

1 INTRODUCTION

2 BACKGROUND INFORMATION

2.1 The XerographicPrinting Process 3

2.2 Color Spaces 4

2.2.1 RGBColor Space 4

2.2.2 CMY Color Space 4

2.2.3 LAB Color Space 6

2.3 Gamut Mapping 7

3 DEVICE MODELING 10

3.1 Forward Printer Map 11

3.2 Inverse Printer Map 12

4 THEORY 13

4.1 Algorithms 16

4.1.1 Shepard's Algorithm 17

4.1.2 Moving Matrix Algorithm 18

4.1.3 Iteratively Clustered Interpolation Algorithm 19

4.1.4 Trilinear Interpolation 23

5 SOFTWAREIMPLEMENTATION 25

5.1 Algorithm Metrics 25

5.1.1 DeterminingAlgorithm Complexity 25

5.1.2 DeterminingAlgorithmExecution Speed 26

5.1.3 DeterminingAlgorithm Accuracy 26

5.2 OptimalParameters 27

5.2.1 Shepard'sAlgorithm 28

5.2.2 Moving Matrix Algorithm 28

5.2.3 ICI Algorithm 28

5.3 Simulation Results 30

6 ICI HARDWAREIMPLEMENTATION 32

6.1 Design Methodology 32

(7)

6.1.2 Datastorage 33

6.1.2.1 Operation 34

6.1.2.2

Memory

Data Layout 35

6.1.3 Data Representation 36

6.2 Synthesis Methodology 37

6.3 Simulation&Testing 39

6.4 Hardware Modules 40

6.4.1 TrilinearInterpolation Module

(tritop.vhd)

40

6.4.1.1 Operation 42

6.4.1.2 Sub-blocks 43

6.4.1.3

Summary

ofResults 50

6.4.2 ICIModule

(icitop.vhd)

51

6.4.2.1 Operation 55

6.4.2.2 Input Description 55

6.4.2.3 Sub-blocks 56

6.4.3 SummaryofResults 63

6.5 SystemOverview 63

7 CONCLUDING REMARKS 65

7.1 Conclusions 65

7.2 Future Work 66

7.3 Acknowledgements 66

8 REFERENCES 68

9 APPENDICES 69

9.1 Software Simulation accuracy Results 69

9.1.1 Shepard'sInterpolation 70

9.1.2 Moving Matrix 71

9.1.3 ICI 71

9.2 VHDL Code 73

9.2.1 TRIJTOP.VHD 73

9.2.2 TRI_CTRL.VHD 77

9.2.3 TRICXBLOCK.VHD 84

9.2.4 TRI_TERM_BLOCK.VHD 85

9.2.5 TRI_ADDRJBLOCK.VHD 86

9.2.6 TRI_TB_TOP.VHD 87

9.2.7 ICITOP.VHD 89

9.2.8 ICI_CTRL.VHD 93

9.2.9 ICIJDP.VHD 103

9.2.10 DIS_DP.VHD 104

9.2.11 ADDR_CALC.VHD 105

(8)

List

of

Figures

Figure 1: The Xerographic

Printing

Process 3

Figure 2: AxesoftheCIELAB color space 6

Figure 3: Gamut

Mapping

for Constant LightnessandHue 10

Figure 4: Systemof P"1

andP 14

Figure 5: Sub-cubewithLattice Points Determined

During

Extraction 23

Figure 6: ID LinearInterpolation 24

Figure 7:

Memory

Module Diagram 34

Figure 8:

Top

Level DiagramofTrilinearInterpolationModule 40

Figure 9: Submodule DiagramofTrilinearInterpolationModule 41

Figure 10:

Top

LevelDiagramofTrilinear Interpolation TRICTRL Block 44

Figure 11:

Top

Level DiagramofTrilinear Interpolation TRIADDRBLOCK 45

Figure 12: DatapathArchitectureofTRIADDRBLOCK 46

Figure 13:

Top

Level DiagramofTrilinear Interpolation TRICXBLOCK 47

Figure 14: DatapathArchitectureofTRI_CX_BLOCK 47

Figure 15:

Top

Level DiagramofTrilinear Interpolation TRI_TERM_BLOCK 48

Figure 16: Datapath ArchitectureofTRITERMBLOCK 49

Figure 17:

Top

Level Diagram ofICI Module 51

Figure 18: Submodule DiagramofICI Module 52

Figure 19:

Top

Level DiagramofICI_CTRL Block 57

Figure 20:

Top

Level Diagram ofICIADDRCALCBlock 58

Figure 21: Datapath ArchitectureofICIADDRCALCBlock 58

Figure 22:

Top

Level Diagram ofICI DIS_DP Block 59

Figure 23: Datapath ArchitectureofICI DISDP Block 60

Figure 24:

Top

Level DigramofICIDP Block 61

Figure 25: Datapath ArchitectureofICIJDPBlock 62

Figure 26: DiagramofSystemto Compute P"

64

(9)

List

of

Tables

Table 1:

Optimal

AlgorithmParameters 29

Table 2: Algorithm

Accuracy

30

Table 3: Algorithm Execution Time 30

Table 4: Algorithm

Complexity

30

Table5: Trilinear Interpolation Module I/O Signals 42

Table6: Trilinear Interpolation ModuleSynthesisResults 50

Table 7: ICIModule I/OSignals 53

(10)

1

Introduction

One of the most crucial challenges

facing

today's printer manufacturers is to design

printers that accurately and consistently reproduce the colors in images [1]. This is a

difficulttaskbecause the transformation fromtheprinter'sinput color spaceto its output

color space isnon-linear and drifts overtime. Colorreproductionis also affected

by

the

physicalproperties ofthe media,

including

temperature,

moisture content,

brightness,

and

weight. These variables are constantly changing and can lead to inaccurate and

inconsistentcolorreproduction. Whenthis occurs,theprinter mustbere-calibrated. The

calibrationprocess is

typically

manual and

time-consuming

and reduces theproductivity

oftheprinter.

Acolor correction system is neededto account forthese variationsin orderto accurately

andconsistentlyreproduce thecolors inprintedimages. Solutions that can automatethe

color correction process will lead to very productive printers with little "downtime".

There is an abundance of research activity aimed at

developing

efficient and effective

colorcalibration and control systems

[6, 8,

10]. The forward andinverseprintermaps are

centraltomanyofthesesystems.

A forward printermap is apractical and accurate model ofaprinter. This forward map

takes the form of a multidimensional

look-up

table and is constructed

by

performing

input-output color experiments on an actual printer. The inverse ofthis forward map,

called the inverse printer map, is needed to convert an input color specification in a

(11)

before

being

presentedto theprint engine. Ifthe

inverse

printermapis accurate, theprint

engine will produce a color reproduction that closely matches the original

device-independent color specification.

Any

measured change in the forward printer map

requires re-computation ofthe inverse map if color accuracy and consistency is to be

maintained. A method for accurately and efficiently computing the inverse of the

forward map issought.

Section 1 ofthis paper is this briefintroduction. In Section

2,

several important topics

relating to this project will be discussed. Section 3 describes howprinters are modeled.

Section 4presentstheproblem athand andthe

theory

behindtheproblem. Theinversion

algorithms are also discussed in this chapter. Section 5 describes the software

implementation and results. Section 6 describes the hardware implementation and

results.

Finally,

Section 7 concludes the paper with a discussion ofconclusions, future

work, and acknowledgements. ReferencesandAppendices canbe foundattheend ofthe

paper. The CD accompanying this paper includes the paper itselfas well as data

files,

(12)

2

Background Information

2. 1

The

Xerographic

Printing

Process

Imaging

systems consist of a network consisting primarily of video

display

monitors,

scanners, and printers. Each ofthese devices havetheir owndevice-specific color space;

generally RGB for monitors and scanners and CMY for printers. These devices

communicate image data with each other using a psycho-physical based and

device-independentcolorspace, suchas LAB.

Aspecificationinthedevice-independentcolor spaceistransformedintoa

device-dependentcolor space usingamultidimensional color correctiontableorinversecolor

map. Thisinverse mapisbasedontheforwardtransformationcharacteristicsofthe

printer. Onceconvertedintothedevice-dependentcolorspace,thecolor-separatedimage

isconvertedto halftonesanddeliveredto theprintenginethatisresponsibleforthe

physical productionoftheprint [8]. Thisprocessisillustrated in Figure 1.

Color

Specification

LAB

Printer

Pre-processor

(Inverse ColorMap) CMY

Print Engine

(ForwardColorMap)

Color

-?Reproduction

[image:12.499.42.434.489.588.2]

LAB

(13)

2.2

Color

Spaces

Coloris specified inathree

dimensional

space where each point inthe space describesa

single color. There are several such color spaces, such as

RGB, CMY,

and LAB. This

project focuses on printers;

therefore,

this research work was

done using the CMY and

LAB color spaces described in this section. The RGB color space is also described in

order to further explain and contrast the additive and subtractive nature ofthe color

systems. Although not used

directly

in the printing process, RGB is the color system

used on the displays that show the images prior to printing and to which the color

reproductionis often compared.

2.2.1

RGB

Color Space

IntheRGB space, a color consists oftheprimarycomponentsred, green, andblue. The

RGB system is additive, meaning that varying amounts of the primaries are added

together to produce agiven color. The additive nature ofthe RGB system makes it the

color system of choice for devices that are self-illuminating light sources, such as

displays and scanners. This color space is known as a device-dependent color space

becausethesameRGB colorspecificationwillvary perceptuallyondifferent devices.

2.2.2

CMY Color Space

Another device-dependent color space isthe CMY space. A colorin this space consists

(14)

meaningthatvaryingamounts oftheprimaries are used together to absorbcertaincolors;

thosecolorsthatarereflected,ratherthanabsorbed,combineto formtheperceived color.

Colorprints are illuminated

by

an external light source, and theperceived colors aredue

to the combination of reflected wavelengths.

Thus,

a printer's input color space is

typically

described in CMY.

Printers

typically

add a fourth ink

-black

-which results in four-color CMYK. The

black ink is addedforseveral reasons. Toproduceblack using onlytheCMYprimaries,

100percent ofeachinkmustbeused. Thisisadisadvantagebecausecolorinksare more

expensive andbecauseusing 100percentofthe threeprimariesresults inaverywet print

thatis susceptibleto wrinkles andtakesa

long

time todry.

Furthermore,

thecombination

offull CMY usuallyresults inmore of amuddy brown thanblack color.

Also,

because

the overlayofinks maynotbeexact,blackportions of a printmay have edgesthatdonot

have full CMY coverage. Ifthese reasons are not enough,

including

black in the color

mix can increase the contrast of colors

thereby

increasing

a printer's color gamut.

Replacing

some percentage of the three primaries is achieved

by

gray component

replacement and under color removal [7]. A color can be specified using only these

primaries; theblack component is addedlater

by

a separate and independent processing

step. In this project, we need only consider the three primaries cyan, magenta, and

(15)

2.2.3

LAB

Color Space

The LAB

(CIELab)

color system was

developed

by

the Commission Internationale de

l'Eclairage

(CIE)

in 1976 [7]. There is adistributionof wavelengthsfor any given light

thatis calledthe spectral power

distribution.

This system is basedonthe spectral power

distribution of colors. The CIE

defined

three color-matching functions for humans that

arebased onthe humanpsycho-physical perception of color. The integral ofthe spectral

powerdistribution is weighted

by

the three color-matching functions to provide what is

known as the tristimulus values of the color. The LAB system defines a three

dimensional color space with tristimulus values

L,

a, and b. The L component is a

measure oflightnessof a color. Thisaxis variesfrom 0

(black)

to 100 (referencewhite).

The a component is a chromatic measure of red-green, and the b component is a

chromatic measure ofblue-yellow. The maximum ofa,specified only

by

+a, isonlyred

with no green, whilethe minimum ofa, specifiedonly

by

-a, is only green with no red.

Similarly,

the maximum of

b,

specifiedonly

by

+b,

is only yellow withno

blue,

andthe

minimum of

b,

specifiedonly

by

-b, is onlybluewith no yellow. The axes ofthis color [image:15.499.163.343.482.638.2]

space are shownin Figure 2.

(16)

The LAB color space is

device-independent;

color specification using this system are

representative only ofthe color's spectral power distribution and is in noway relatedto

anyparticulardevice. Colors withidentical spectral power distribution are perceivedto

be exactly the same color, provided

they

are seen in the same visual environments,

regardless ofthe manner in which the color is

being

displayed. This color space is the

industry

standardfor

describing

colors ina mannerthatis independentofany device.

2.3

Gamut

Mapping

Printersareincapable ofreproducing everysingle colorina givencolor space. The set or

range ofcolors that aprinter can reproduce is known as theprinter's color gamut. An

input color specification canbe located either inside or outside ofthis gamut. A printer

can

immediately

reproduce a colorthatis insidethe gamut. Colors locatedoutside ofthe

gamut are notreproducible

by

theprinterandmustbemappedtoacolorinthegamut.

This section describes the gamut mapping process that was used in this project [4].

Gamut mapping was necessary after constructing the structured input of the inverse

printermap (see Section 4). Theprocess consists oftwo steps: first

determining

whether

an input color specification (input point) is located inside or outside the gamut, then

mappingcolorsthatare outside ofthegamuttocolors insidethegamut.

The procedure to determine whether an input point is located inside or outside ofthe

printer gamut begins

by

upsampling the forward printer map P. This 133
(17)

upsampledto 643 using trilinear

interpolation.

The upsampling creates a continuous 3D

solid with a smooth

boundary

in the CMY and corresponding LAB color spaces. The

boundary

points ofthe CMY structure are assumedto correspondto the

boundary

points

oftheLAB structure. Thisis a practical and acceptable assumption formostxerographic

printers.

The inputpoint andthe

boundary

points oftheLAB structure are shifted with respect to

the centroid ofthe LAB structure. This centroidis the average of all ofthepoints inthe

gamut, approximately

L=50,

a=0, and b=0 for most xerographic printers. These points

are then converted to spherical coordinates r, a, and 9. The conversion is given

by

the

following

equations:

r =

J{L-LE)2+{a-aE)2+{b-bEy

a =arctan

f(b-bE)^

K{a-aE)j

9=arctan

f \

(L-LE)

J(a-aE)2+(b-bE)

2 J

In the above equations,

[LE

aE

bE]

specifies the centroid ofthe LAB structure, r is the

distance fromthecentroidto theinputpoint, ais thehueangle with range360. 9isthe

angleinaplane of constant awith range 180.

The setof

boundary

pointsissearched forthosepointsthatfall into therange aAaand
(18)

cone around the inputpoint. Theaverage distance ris calculated forall

boundary

points

that fall within this cone. Ifthedistance parameter oftheinputpoint is greaterthan this

average, then the inputpoint is consideredto beoutside ofthe gamut.

Conversely,

ifthe

inputpoint'sdistanceislessthan the average, itis consideredtobe insidethegamut.

Apoint outsidethegamut shouldbemappedto a pointinthegamut suchthat the original

and mapped points are as perceptually close as possible. Research has shown that the

perceptual difference between two colors is lowest when the lightness and hue ofthe

colors are equal

[5,

7].

Therefore,

themapping approachused forthisproject aimed to

preservethe lightness andhue ofthe originalpoint. The

boundary

ofthe LAB structure

was searchedtofindthepointthatbest satisfiedthemappingcriteriadescribedaboveand

given

by

thefollowequations:

L= L'

a'

=arctan =arctan

[a]

(19)

[L a

b]

specifies the original input point and [L' a'

b']

specifies the mapped point. The

mapping is illustrated in Figure3.

b

Note: a-b planeforconstantLshown

(L=L')

Figure 3:Gamut

Mapping

for Constant LightnessandHue

3

Device

Modeling

Device color calibration requires a model ofthe device. These models are generally

either theoretical or empirical. The advantage of theoretical models is that color

prediction can be performed with relatively few actual measurements.

However,

these

analytical models are usually not very accurate because

they

do not adequately capture

the nonlinearities ofreal systems due in large part to external variables. Examples of

such external variables are temperature,

humidity

and the weight and type of printable

media. These variables have a significant affect onthe colors that are reproduced. The

Neugebauerequations areaclassic example ofcolorpredictionusing atheoretical model

(20)

Polynomial regression isonetypeof empirical model. This model suffers from accuracy

problems; predictionofcolors close tothe sample points used fortheregression provide

acceptable accuracy,but fortherest ofthepoints inthegamutthereisno guarantee [9].

Themost accurate andpractical, andtherefore the

industry

standard, empirical modelis a

look-up

table containing input-output pairs that characterize the device. This model is

discussed in detail inthe

following

section.

3. 1

Forward Printer

Map

A colorprinter canbe viewed as a devicethatmaps colors from aninput color spaceto

an output color space. Theprinter canbemodeled

by

a

look-up

tablecontainingpointsin

theinput color spaceandtheircorresponding mappingtopointsintheoutputcolorspace.

The fullmodel canberealized

by

interpolation for inputpoints not containedinthelook

up table. The

look-up

table is anapproximationto the actual printer

function;

thelarger

the

look-up

table, the better the approximation. This

look-up

table characterizes the

printer function and is referred to the forward printer map. The forwardprinter map is

denoted P.

The forwardprinter map is constructed experimentally

by

printing colors inthe device's

input color space on paper and measuring these color patches in the output color space

using a spectrophotometer. The disadvantage of this model is that many color

experiments must be performed in order to construct a forward map that is an accurate

(21)

especiallybecausethisprocesscurrentlynon-automated. For

example, tocovertheentire

gamut of atypical printer, a

10x10x10

entrytableresulting in 6,000parameters is needed

foracceptablemodeling accuracy

[6,

8].

The input ofthe forward printermap can be structured

by

sampling the printer's input

spaceinequal stepsalongeach axis ofthesource domain. These sampledpoints become

the

"node", "grid",

or "lattice"points ofthe forwardmap. Forn levels of

division,

this

gives (n-1)3 cubes and

n3

lattice points. The n3

lattice points ofthe source space are

printed, and these color patches are measured to determine the output color space

specification. The correspondingvalues fromthe source anddestination spaces populate

the

look-up

table. Note that the output of P will be unstructured (or non-uniform)

because the transformation from input color space to output color space is

typically

highly

nonlinear.

3.2 Inverse Printer

Map

The inverseprintermap, denoted

F1

',

isconstructedfromtheforwardprintermap P. The

inverse map associates points in the output color space with points inthe device's input

color space.

Simply

swappingthedata from P isnotdesirable fortwo reasons.

First,

the

inverse

look-up

tablewill notbewelldefinedforcolorsnearthegamut

boundary

because

it ispossiblethatmorethanoneinputpointbemappedto the same output point.

Second,

it results in a

look-up

table with unstructured input. Interpolation with data that has

unstructured input is complex and time-consuming. On the other

hand,

ifthe input is
(22)

Geometrical linear

interpolation

is preferred over other non-linear interpolation

techniques because it is more efficient and accurate. There are several geometrical

interpolations,

including

trilinear,

prism,pyramid, andtetrahedral [5]. Eachofthesevary

in terms of accuracy, extraction (or search) complexity, and interpolation (or

computation) complexity. In this thesis project, trilinear interpolation was utilized for

look-up

tableinterpolation(see Section4.1.4).

Instead of simply swapping the data from

P,

algorithms are required that interpolate

irregularly

sampled multi-dimensionaldata from P. These algorithmsmustbecapable of

efficiently computingthe inverse map sothat it has structuredinputpoints and so that it

is as best an approximation to the true inverse as possible. Several ofthese algorithms

arediscussedinthenextsection.

4

Theory

Consider the forwardprintermap Pthatmaps apoint vintheprinter's inputcolorspace

to a pointzintheoutput color space. Alsoconsidertheinverseprintermap P1

thatmaps

a pointxinthe targetcolor spaceto apoint y intheprinter'sinputcolor space. Given a

point x in the target color space, the inverse color map is used to find a point v in the

printer's inputcolor spacethat theprinter willmap to a point zinthe output color space.

(23)

-? z

Figure4: SystemofP1andP

Thedesiredresultis fortheprintertoprint a colorz

that,

whenmeasured, willbeas close

as possibletothe requested colorx. The difference betweenthe target andoutput colors

can be quantified

by

computingtheEuclidean distance betweenthem. This difference is

knownas

AE(P(y),x)

orsimply AEandisgiven

by

Equation 1.

AE=

\\z-x\\

=

\\P{y)-x

(1)

Notethat

z=P(y)

In general, and specifically for this project, the target and output color space is the

device-independent LAB color space. For thetarget color

(LAB)jn

andthe output color

(LAB)out,

Equation 1 becomes:

A=J{Loul-LJ2+{Aoul-

Am

f

+

{Boul

(24)

Thesmallerthe valueofAE fora given

input,

thebettertheinversemap. Humansdonot

perceive color differences less than 1.0. AE = 1.0 is

know as the "just noticeable color

difference."

Thisleadsto the

following

optimization problem:

Given atarget colorx andthe forwardprintermap

P,

find theprinter inputythat

solves

minAE{P{y),x)

(2)

y

Solving

this optimization problem results in the inverseprinter map P'1

. The structured

inverseprinter

look-up

table isconstructed accordingto themethodologyoutlined in

[1],

re-statedbelow:

1. Obtainthe estimated forwardprintermap,

P,

ofa given color printer. This is

achieved

by

equally sampling theprinter input space

Y

andthen theprinted

color Zj for each grid node y is obtained from experiments on the actual

printer. In addition, some interpolation technique is used to estimate the

outputscorresponding toinputpointsthatarenotgridnodes.

2. Gridthe target color spaceXXo obtain an ordered collection of vectors Xj, i =

1,

2,

..., h (each x{ corresponds to a grid node). This

will result in gridnodes

that are outside ofthe printergamut; thesepoints must be firstmapped onto

thegamutusingthe approachdescribed in Section 2.3.

3. Obtainthe yf thatsolvesEquation2 forXj and

P,

fori=
(25)

4. Select a mutidimensional

interpolation

technique to obtain the inverse map

outputs ycorrespondingto inputpointsxthat arenotexactlygridnodes.

Several algorithms have been proposed for solving the optimization problem given

by

Equation 2. These algorithms can vary widely interms ofaccuracy,numerical stability,

speed, and complexity. Three of these algorithms are Shepard's Interpolation

[3],

Moving

Matrix

[2],

and

Iteratively

Clustered Interpolation [1]. These algorithms were

the subject of study in this thesis. The details of each algorithm are discussed in the

following

sections. Trilinear interpolation is used

frequently by

the ICI algorithm to

interpolatethroughPand

F1;

it is discussedattheend ofthis section.

4.1

Algorithms

The algorithms will be presented in an abstract sense, where x represents colors in the

targetcolor space, yrepresents colors in theprinter's input colorspace, andzrepresents

colorsintheprinter's output color space.

Forthealgorithms discussed inthis section,recall thathe forwardprintermap Pcontains

Nnumber of entries anddefinesthemapping function:

(26)

For the forward printer map used in this project, the printer's input color space was

CMY,

anditsoutput color space wasLAB. Thealgorithms canbeappliedto this specific

application

by

rememberingthat x, y, andzare equivalentto thecolorspacevectors:

x =

[LAB]

y =

[CMYj

z=[LAB]

4.1.1

Shepard's Algorithm

Shepard's algorithm is based on the work ofDonald Shepard and is a well-established

method for

interpolating

scattered data [3]. It is essentially an application of weighted

averaging. The value for a giveninput point is calculated

by

a weighted average of all

otherdatapoints, where the weighting isa function ofthe distance from the givenpoint

to theotherdatapoints.

Givena color xin the target color space, Shepard's algorithm computes the inversej?

by

the

following

equation:

y

ifdj

^ 0 for all z }

y.

if

dj

=0 for some

Zj

where

dj

is the

Lp

normbetweentheinputpointx and z/.
(27)

There aretwo variable parameters inthis algorithm:p andju. The p parameter specifies

howthe distances betweenpoints are calculated. The /uparameter affects the

locality

of

theweighting

function;

largevalues of /u resultinmore local

behavior,

which means that

onlythosepoints closestto theinputpoint willbesignificant.

4.1.2

Moving

Matrix Algorithm

The

Moving

Matrix algorithm computes theprinter inverse using linearweighted

least-squares regression [2]. The inverse is given

by

the

following

equation, whereA is the

transformationmatrix.

y =

xAT

(3)

ThetransformationmatrixA is found

by

minimizing theweightedsquared error given

by

the

following

equation:

E=z;=I^h-^ir

(4)

Differentiating

Equation 4 with respect to

A,

setting it equal to zero, and solving for A

yieldstheclosedformsolution:

A Ut^I

where

.=IJ'>jk4

md
(28)

The weighting is a function ofthe distances from the input point x to all ofthe other

pointsZjcontainedintheforwardprinter map:

1

df+e

Here

dj

istheEuclideandistance fromtheinputpointxto thepoint Zj.

The variable parameters of this algorithm are /u and s. These parameters affect the

locality

ofthe regression. Large values of/u and small values ofs give

W)

more local

behavior,

whichmeansthatonlythosepoints closestto theinputpoint willbesignificant.

4.1.3

Iteratively

Clustered Interpolation Algorithm

The

Iteratively

Clustered Interpolation

(ICI)

algorithm is a gradient-based optimization

methodthatusesaniterativetechniquetogenerate initialpoints [1].

An unconstrained gradient-based optimization algorithm to solve Equation 2 would be

given

by

[11]:

y{k+

\)

=y{k)-f3

rdAE{y,x)2^

dy

k>

0,

fora giveny(0)

(5)

Notethat

AE(y,x)

isEquation

1,

restatedbelow:

A=

[z-x|

=
(29)

Thus,

A(

y,x)2 =

\\P{y)

-xf

=

(P{y)

-xj

(P{y)

-x)

Thegradientof

AE(y,x)2

with respectto yforafixedxisthen:

dAE{y,x)2

dy

=

2J{y)'{P{y)-x)

(6)

Substituting

Equation 6 into Equation 5 givesthe

following

update equation:

y{k+

\)

=

y{k)-vJk<{P{y{k))-x)

(7)

where /u=2fi.

P{y{k))

-x is a component vector ofthe differences between thetarget

color x and the output color z produced

by interpolating

y(k) through the forwardprinter

map P:

P{y{k))-X=

Zk,0 XQ

Zk,l -X,

Jk.2 -x2_

Jk

istheJacobianof

(P(y(k))

-x) evaluatedatv =

y(k), andisgivenby:

Jk

=

dz1

dz2

dz3

dyx

dzx

dy,

dz2

dy,

dz3

dy2

dz]

dy2

dz2

dy2

dz3

(30)

This matrix is also known as the gradient matrix. It is calculated using numerical

differentiation of the forward printer map P. dz is an average of the forward and

backward finite differences that result from separately varying each component ofv and

interpolating

throughP.

Gradient-based optimization methods suffer from drawback that a solution may be

optimalonly inalocal sense,ratherthan

being

theglobaloptimum. Inorderto avoidthis

situation, theinitial estimation oftheinverse mustbe closeto the actual globallyoptimal

solution. A novel and efficient procedure to determine a

"good"

initial estimate is

presentedin

[1]

andisre-statedasfollows:

1. Searchthe

z,-points ofPfortheonethatis closestto the giventargetpoint x. Call

thispoint zaux.

2. Find the correspondingpointyaux. This is thepoint suchthatzaux =

P(yauJ-Yaux

is anode point inthegridofthe Yspace,whichis apointintheinputspace ofthe

forwardprintermap P. Thisyauxis acourse estimatefortheinverseof x.

3. Select Mpoints inthe neighborhoodofyaux; generate a cluster ofMpoints

(yauxl,

yaux2, , yauxM)

by

moving along axes around

yaux.

Map

these Mpoints to obtain

Zauxj=

P(yauxj),

j

=

1,

2,

...,M.

4. Findthe closest ofthe zauxjpoints to the inputx and call this point z0. Take the

(31)

P(yo)-After usingthis procedure to find the initial estimate

P(0),

the update equation given

by

Equation 7 can be used. Define parameters

kmax

and sthat are the stop criteria ofthe

iterationoftheupdate equation.

s specifiestheerrorthresholdat whichtheupdateiterationscanstop, suchthat

|P(y(*))-x|<*

(8)

kmax

is the maximum number ofiterations that algorithm will perform when computing

the inverse for a given point. It is useful to define this parameter as a stop criteria

becausenot every input pointwill result in an errorthatis belowthe threshold s. When

either Equation 8 is satisfied or k >

kmax,

the algorithm stops andy(k) is taken as the

solution.

The parameter 8 for this algorithm specifies the perturbation used in the numerical

differentiationmethod usedto computethegradient matrix.

The parameter ju should be selected in order to achieve fast convergence and to meet

accuracyrequirements. Thisparameter shouldbebounded

by

the

following

equation, as

discussed in [1]:

2 0<ju<

(32)

4.1.4 Trilinear

Interpolation

Trilinear interpolation of a

look-up

table is a three-dimensional geometric method for

computing the output values ofinputpoints that are not containedin the

look-up

tables

[5]. For the sake ofthis

discussion,

the

look-up

table is either the forward or inverse

printer map, andthisinterpolation technique isused withthesemaps forinputpoints that

are not grid points.

Trilinearinterpolationconsistsoftwosteps:theextraction stepandtheinterpolation step.

In theextractionstep, the sub-cubeinthe source color spacethatcontains theinput point

is determined

by

a series of comparisons. The eight vertices ofthis cube are the lattice

points inthe source space, as shownin Figure 5.

Pon(xo,yi, zO

Pooi(xo!yo,z0

Poio(xo,Vi,ZoLfc

Pooo(xOryo,Zo)

Pm(xi,yi,Zi)

y<>,zO

Pno(xi,yi,zo)

[image:32.499.103.365.396.539.2]

Pioo(xi>yo,Zq)

Figure 5: Sub-cubewithLattice Points DeterminedDuringExtraction

The interpolation step consists of the repeated use of one-dimensional linear

(33)

Referring

to Figure

6,

a pointp on the curve between lattice pointspo andpi is to be

interpolated.

[image:33.499.136.332.98.222.2]

xO x

Figure 6: ID Linear Interpolation

The interpolated value, pc(x), is

linearly

proportional to the ratio (x-xo)/(xi-xo).

Therefore,

Pc(x)=

P(X0)+

[p{*i)

-P{*o)]

Oi-*o)

Three-dimensional trilinear interpolation consists of seven linear interpolations on the

sub-cubedepictedin Figure 5. Thisresultsinthe

following

equations [5]:

Ax =x

-xo

Ay=y-y0

Az= z-zo Co =Pooo ci =

(pioo

-pooo)/

(xi

-x0)

C2 =

(poio

-pooo)/

(yi

-yo)

C3 =

(P001

-POOO)/

(Zl

-Zo)

c4=

(pi

io-poio -pioo +Pooo)/

[(xi

-x0)

(yi

-yo)]

C5 =

(Pioi

-pooi-Pioo

+P000)

/

i(xi

-xo)

(z\-zo)]

C6 =

(pon

-Pool -Poio+P000) /

[(yi

-yo)

(zi-z0)]

cj=

(pm

-pon -pioi -pi10 +Pioo +Pooi +

Poio-Pooo)/

[(xi

-x0)

(yi

-yo)

(zi

-z0)]

p(x,y,z) =c0 +cjAx +c2Ay +c3Az +c4AxAy +

(34)

5

Software

Implementation

Softwareprograms

implementing

the

Shepards, Moving

Matrix,

and ICIalgorithms were

written inthe C programming language. Theprograms readdata files that contained the

forward printer map and the gamut-mapped input points of the inverse map. The

programs executedthe algorithms onthis datato computetheinverse forthe giveninput

points; this output datawas written to separate data files for lateranalysis. The forward

printer map was represented

by

two data files: one had 133

CMY entries, and the other

had the corresponding 133 LAB entries. This experimental printer data was obtained

from Dr. L. K. Mestha from Xerox.

5. 1

Algorithm Metrics

These software implementations and simulations provided relative accuracy, execution

time, andcomputationalcomplexityinformation foreach algorithm.

5.1.1

Determining

Algorithm

Complexity

The complexity ofthe algorithms was measured objectively

by

including

code to count

the number ofmultiplication/division and square root operations performed to compute

the inverse. These operations are costly in hardware because the modules to perform

themareboth large and slow ascomparedtoother operations. The algorithms were also

(35)

arithmetic modules, the amount of parallelism that might be exploited, and the

complexityofthe required control logic. A combination ofthe objective and subjective

complexitywas usedto rate each algorithm's overallcomplexityrelativeto the others.

5.1.2

Determining

Algorithm

Execution Speed

The csh command interpreterrunning ontheUnix platformhas abuilt-in utilitythat can

be used to determine the amount oftime a program is executing on the system. The

executiontimes ofthe programs were measuredusingthis utility, whichis an executable

named time. In order to form a valid comparison, the programs were executed on the

same machineunderthe same environment. Program developmentand executionwas on

Rochester Institute ofTechnology's Grace computing system. Grace is an Alpha 4100

5/533 with3 EV5.6 CPUs. It has 1.5GB ofmemoryandis running Tru64

UNIX,

version

4.0F.

5.1.3

Determining

Algorithm

Accuracy

The programs computed the inverse forthe 133

gamut-mapped LAB points x using the

forward printer map data. The completion of the programs resulted in a data file

containing

133

CMYentriesthatdefinedtheinverse y:

y =

(36)

Eachpoint

5>;

was

interpolated

throughthe forwardprintermap Pto findtheresultingzj.

The differences

AEj

between zy and the corresponding xj such that jk=

F'(xj)

were

computed. Thiscanbeshowngraphicallyas:

"j

*,.

>yj

>ZJ

AEj

'

Themean error and standard

deviation,

theminimumerror, andthemaximum error were

calculated from the

AEj

-these statistics representedthe accuracyofthe algorithms. It

should be noted that these error statistics were computed using the node points ofthe

inverse map as inputs. The accuracymetrics of each algorithmwere usedto comparethe

algorithms in terms of accuracy. Separate programs were written to perform the

interpolationthroughP and to compute the accuracymetrics; these are also included on

theCD accompanyingthispaper.

5.2

Optimal Parameters

Each algorithm was executed several times while varying the algorithm parameters.

Accuracy

metrics for each trial were measured to gain insight into how the parameters

affectthe accuracy. Thesetrials also ledto the values oftheparametersthatresultinthe

best accuracy. Tables showing the accuracydata for differentparametervalues foreach

algorithm are shown in the Appendix (see Section 9.1). The results ofthese trials are

(37)

5.2.1

Shepard's Algorithm

ForShepard's algorithm, there arethetwo variableparameters/* andju. The/?parameter

specifies how the distances between points are calculated. The trials showed that the

value ofp does not have much affect on the accuracy.

Therefore,

p = 2

was taken

because this isthemostconvenient as itresults inadistance measurementthat issimply

the Euclidean distance. The /j. parameter affects the

locality

ofthe weighting function.

The simulation results showthatju=5

providesthebestaccuracy.

5.2.2

Moving

Matrix Algorithm

The variableparameters inthe

Moving

Matrix algorithm arejuand e. Similarto thesame

parameter in Shepard's algorithm, the // parameter affects the

locality

ofthe regression.

Simulations showed that ju= 5

resulted in thebest accuracy. The eparameter does not

have a significant affect on the accuracy and is taken as a small value to avoid

ill-conditioningofthe

Si

matrix. Inthese simulations, fwastakenas

10" .

5.2.3 ICI Algorithm

The ICI algorithm has parameters e,

kmaXt

5,

and /j.. s and

kmax

are the stop criteria; s

specifies the error threshold at which the update iterations can stop, and

kmax

is the

maximum number ofiterations that algorithm will perform when computing the inverse

for a given point. Theparameter 8 for this algorithm specifies the perturbation used in

(38)

fu should be selected in order to achieve fast convergence and to meet accuracy

requirements.

The software simulations showedthat themean errorforthisalgorithm canbereduced

by

decreasedthe errorthreshold s. Inorderto achieve fast convergence, swastakenas0.5.

This is belowthe"justnoticeabledifference"ofAE= 1.0. Other

simulationshave shown

that even better accuracy can be obtained

by

further reducing s [4]. The software

simulations also showed that the best choice of

kmax

is 50.

Increasing

this parameter

beyond 50 does not result in substantial gains in accuracy; ifthe algorithm is going to

converge for a given input point, it will do so within 50 iterations. The perturbation 5

was determined to not have any measurable affect on the accuracy ofthe algorithm; 5

was taken to be

10,

which is approximately a 4% variance ofthe printer's input color

components

(

0<

C, M,

Y<255 and

10/255=0.04)

andis more orless arbitrary.

Finally,

thesimulations showed/j.=4tobe

[image:38.499.135.347.553.673.2]

a good choiceintermsofaccuracyand convergence.

Table 1 showstheoptimal parameters foundforeach algorithm.

Table 1: Optimal Algorithm Parameters

Shepard's P=2

/j= 5

Moving

Matrix

=10"4

ICI 5=0.5

5=10

(39)

5.3

Simulation

Results

Theaccuracy, speed, andcomplexityofeach algorithmis summarizedin Tables

2, 3,

and

4,

respectively. The metrics are obtained from the simulations using optimal algorithm

parameters and operating on the same data and on the same machine in the same

environment. All measurements are made forthe algorithms computing the 133 inverse

map fromthe

133

[image:39.499.79.419.250.326.2]

forwardmap.

Table 2: AlgorithmAccuracy

Mean

AE

Std

Dev

Mean+

2*StdDev

Min Max

Shepard's 1.64 0.54 2.73 0.15 3.81

Moving

Matrix 1.28 2.29 5.86 0.02 54.88

ICI 0.39 0.12 0.63 0.05 1.68

Table 3: Algorithm Execution Time

ExecutionTime

(sec)

Shepard's 12

Moving

Matrix 17

ICI 7

Table4: AlgorithmComplexity

Shepard's LOW

Moving

Matrix HIGH

ICI MEDIUM

ICIisclearlythe winnerinterms ofaccuracyand execution speed. Abrief discussion of

each algorithm'scomplexityfollows.

Moving

Matrix has significantlymore multiplications than Shepard's and approximately
(40)

slow design.

Moving

Matrixalso would require complex control logictoperform matrix

operations such as matrixinversion.

Shepard's algorithm would be quite easy to implement because it requires a small

number ofresources and is data flow oriented thus requiring very little control logic.

However,

therearealargenumber of multiplications(threetimes thatof

ICI),

which will

result in a slow design. Even if more resources were used to perform some ofthe

processing inparallel (whichwouldincrease the design area), it is still unlikely itwould

beatout ICIinterms of speed. Although easyto

implement,

Shepard's is inaccurateand

would most

likely

be slowerin hardwarethanICI.

ICI has the smallest number of multiplications, which could result in a less complex

circuit.

However,

ICI has two significant drawbacks.

First,

this algorithm requires

trilinear

interpolation,

whichitselfisa non-trivialhardware implementation.

Second,

ICI

will require control logic for computing thegradientmatrix and for computingtheinitial

estimate

by

theclusteringtechnique.

Of the three algorithms, ICI is the best in terms of speed, accuracy, and hardware

complexity. This algorithm was chosen for hardware implementation in VHDL. The

(41)

6

ICI

Hardware

Implementation

NOTE: All source VHDL files are located on the accompanying CD under the

"Hardware/VHDL Source Code"

folder. The area and

timing

reports for each

sub-module described in this section can also be found on the accompanying CD under the

"Hardware/Reports"

folder. VHDL source code is also shown in the Appendix (see

Section 9.2).

6. 1

Design

Methodology

The algorithm was implemented using VHDL. The design was simulated with Mentor

Graphic's

ModelSim;

the simulation results were compared with results obtained from

software models operating on the same input data. Correct

functionality

ofthe design

was guaranteed since the hardware simulation results matched those of the software

models.

The hardware modules were synthesized using Synopsys's Design Compiler and was

targeted to a generic

library

providedwith the Synopsys'stools. The resultingsynthesis

resultsprovideanet-listthatcanbeusedfor ASIC fabrication.

Several important design decisions were made about the system partitioning and

architecture, data storage, and datarepresentation. These decisions will be described in

(42)

6.1.1

System

Partitioning

and

Architecture

The ICI algorithm makes extensive use oftrilinearinterpolation. Trilinear interpolation

is a geometric method to calculate the output of a 3D function at a given input point,

where the 3D function is defined

by

a finite set ofinput and output pairs. A detailed

discussion oftrilinear

interpolation

can be found in Section 4.1.4. An entirely separate

module was designed to perform trilinear

interpolation;

this module is used

by

the ICI

design and can alsobeusedinother applications.

Both the trilinear interpolation module and ICI module were designed using datapath

blocks and control blocks. This approach allows for

highly

controlled use of the

arithmetic units that make up the datapaths. The advantage of

having

such control over

these units is that operations can be pipelined and the units can be shared where

appropriate.

6.1.2 Datastorage

Therearesix valuesfor eachentry intheforwardprintermap

P;

oneforeach oftheinput

space components andone foreach ofthe output space components. For example, ifthe

P defines amapping from CMY to

LAB,

then there is a value for

C, M,

and Y for the

inputcolor space, and avalue for

L, A,

andB forthe output colorspace. An

N3

look-up

table requires

6N3

bytes of storage. For

N=13,

this results in 13,182 bytes of required
(43)

Synthesis tools are still

incredibly

inefficient

at synthesizing memory; therefore, this

design assumes that an external memory module exists that contains P. The memory

module in Figure 7 is assumed for designpurposes. The interpolation and ICI modules

aredesignedtointerfacewiththismemorymodule.

[image:43.499.89.420.173.270.2]

C>data out

Figure 7:

Memory

Module Diagram

6.1.2.1 Operation

The memory module is synchronous and clocked

by

the elkinput signal; all reads and

writes occur onthe risingedge of elk and completeinone clock cycle. The operation of

themoduleisdescribed below.

(a)

Read: Readsoccur when wris

low;

data_outbecomesthevalue stored atthe

locationspecified

by

addr.

(b)

Write: Writes occur when wris

high;

thelocationspecified

by

addrisstored
(44)

(c) Memory

location: addr specifies the memory location to read from

during

reads and the location to write to

during

writes. Its value can range from zero to

6N3-1.

The

interpolation

and ICImodulesthat

interface

to thismemorynever writeto

it;

thewr

and data_in signals are

included

forcompleteness.

6.1.2.2

Memory

Data Layout

It is assumed that thememory module contains the fowardprinter

look-up

tableprior to

using the

functionality

oftheinterpolation and ICI modules.

Furthermore,

it is assumed

that thevalues are arrangedinthememorymoduleinthe

following

fashion:

Memory

Location Contents

0

Co

1

M0

2

Y0

3

Lo

4

A0

5

B0

6

Ci

7

Mi

8

Y,

9

Li

10

A,

(45)

6.1.3

Data Representation

The project specification requiredthat twodecimal places ofaccuracy beretainedforthe

LAB values. This meant that the design would have to implement

floating

point

operations.

Floating

pointhardware ismore complex and slowercomparedto fixedpoint

hardware.

Additionally,

the design resources did not have floating-point libraries that

containpre-compiled and optimizedfloating-pointmodules. These librarieswould either

have to be purchased, or a significant amount oftime and effort would be required to

designcustomfloating-pointarithmetic modules.

An alternative is to use fixed point arithmetic and scale all ofthe data to integers. The

approach allows for the use ofpre-compiled and optimized fixed point integermodules,

whichsimplifiesthedesignand reduces designtime.

To preserve two decimal places, the data had to be scaled

by

a factor of 100. For

example, 123.45 would be scaled as 123.45*100 = 12345. Integer

addition and

subtraction would occur normally.

However,

multiplications would require theresultto

bescaleddown. Thiscanbeshown

by

the

following

equations:

Original Operation: A* B=AB

With Scaled Data:

(

1 00

A)

*

(

1

00B)

= 1 0000AB

However,

we onlywant

100AB,

which meansthisresultmustbe scaleddown

by

afactor
(46)

scaled fixed point arithmetic; multiplication operations must be followed

by

division.

This disadvantage is a fair trade-off compared to the complexity involved in

implementing

adesigntosupport

floating-point

numbers.

Itwasnecessaryto determine ifusingthis

integer-based

approach would compromisethe

accuracyofthe ICI algorithm. The ICI C language implementation files weremodified

touse onlyinteger datainordertoinvestigatetheissue. The integerbased ICI Ccode is

located on the accompanying CD under the "Software/integer-based ICI"

folder.

Simulation results showed that the accuracy of the integer-based implementation was

equalto its

floating-point-based

counterpart.

6.2

Synthesis

Methodology

The control blocks of the trilinear interpolation and ICI modules are the only

synchronous

(clocked)

blocks ineach ofthe designs. Theseblocks weresynthesized for

speed, and the maximum path

delay

through them defined the clock and maximum

operating

frequency

for the entire design. All otherblocks are asynchronous, and their

maximum path delays are either shorter or longer than the defined clocked. For those

blocks that have a shorter maximum path

delay,

there is no concern because the circuit

will provide valid results within the clock period. For those blocks that have a longer

maximum path

delay,

thecontrolblocksmust either wait theappropriate amount of clock

cycles for the resultsto become valid or continueon with processing. In the latter case,

computation inthe block would continue as the control block would proceed with other

(47)

the logic flow. This concept of parallel computation motivated the design of several

separate datapaths that perform different functions. Wherever this type ofparallelism

could not be exploited, the control block simply waits the appropriate amount ofclock

cycles forvaliddata.

There were no design specifications, such as area or speed, provided

by

the project

requirements for the hardware implementation.

However,

the design has application in

real-time and near real-time applications. Itwas also designed for ASIC implementation

inmind. For these reasons, the designblocks were synthesized forthe fastest possible

performance. This is a reasonable synthesis constraint since the goal is real-time

computation, and the ASIC platform allows for larger and more complex designs as

comparedtoprogrammable

logic,

such as fieldprogrammable gate arrays(FPGA).

Theasynchronousblockswere constrained

by

specifying inputand outputdelays equalto

the defined clock. These constraints forced the synthesis tool to generate the fastest

possible

design,

but one that would never meet the

timing

constraints because ofthe

complexityofthe operations

being

performed. Several ofthe synthesizedblocks'

timing

reports clearlyshowthat

they

donot meet

timing

constraints. This is acceptablebecause

inthis designmethodologyit is understoodthat theseblocks requiremorethanone clock

(48)

6.3

Simulation &

Testing

The VHDL code was simulated using Mentor Graphic's ModelSim tool. Testbenches

were written in order to test and verify correct

functionality

of the hardware design.

These testbenches read input data from

files,

fed itto the module under test, and wrote

the output data into another data file. Other inputs were also controlled

by

the

testbenches;

these

inputs,

such as algorithm parameters, were the same as those used in

the C language software simulations. The output data files written

by

the testbenches

were compared to the output data files written

by

the C software models. Correct

functionality

of the VHDL design was guaranteed because the data contained in the

output files matched. The VHDL source code for the testbenches can be found in the

(49)

6.4

Hardware Modules

[image:49.499.99.410.223.349.2]

6.4.1 Trilinear Interpolation Module

(trijop.vhd)

Figure 8 shows the I/O signals ofthe trilinear interpolationmodule, andFigure 9 shows

theblock diagram ofthemodule. Inputand outputsignals aredescribed in Table 5.

I -8

(50)

O **t

5fAR'f

'H3t'8

O-"<? i.,erst.

r ADS?

-O

a;.

c

-f**>2atj: as

-Ozra: 2

.TERfc.

[image:50.499.48.450.32.657.2]
(51)

Table5:Trilinear InterpolationModule I/O Signals

Signal

Name

I/O Type Description

Clk Input stdlogic Systemclock.

Memin Input std_logic vector[31:0] Input datafromexternalmemory.

Start Input std_logic Interpolation operationbeginsonthe

rising

edge ofthissignal.

XO Input std_logic_vector[31 :0] Specifies the first component of the input

point.

XI Input std_logic_vector[31 :0] Specifics the second component of the

inputpoint.

X2 Input std_logic_vector[31 :0] Specifies the third component ofthe input

point.

Done Output stdlogic Signals the completion ofthe interpolation

operation andvalidoutput.

Error Output std_logic Signals an error occurred

during

the

interpolationoperation. Note: Anerror can

occurs ifthe input point specified

by

XO,

XI,

and X2 is located outside ofthe input

spacedefined intheLUT.

Memaddr Output std_logic_vector[31 :0] Specifiesthelocationofthedesired data in

memory.

ZO Output std_logic_vector[31 :0] First component ofthe output value; valid

whendone=l and error=0.

Zl Output std_logic_vector[31:0] Second component of the output value;

valid whendone=l and error=0.

Z2 Output std_logic_vector[31:0] Thirdcomponent oftheoutput value; valid

whendone=l and error=0.

6.4.1.1

Operation

The system clockmusthave aperiod of40nanoseconds orgreater; this correspondsto a

maximum operating

frequency

of25 MHz. Simulationresults show that this design can [image:51.499.28.469.85.467.2]
(52)

The

rising

edge of startbegins

interpolation

attheinputpoint specified

by

signals

XO,

XI

and X2. Done is asserted when

interpolation

is complete. Iferror is not raised, valid

output occurs on signals

ZO,

Zl and Z2. Memin and memaddr are interface signals to

the external memorymodule. Memin is data comingfromthe memory; memaddris the

location in memory of desired data and is controlled

by

the interpolation module's

controlblock.

6.4.1.2

Sub-blocks

6.4.1.2.1

TRI_CTRL_BLOCK

(trictrl.vhd)

This blockcontains the control logic neededto carryout the interpolation. Its input and

output signals interface to the other sub-blocks and to the external memory module to

control data flow. It is also responsible for

beginning

the interpolation when start is

raisedandfor asserting done and errorwhentheinterpolation is complete.

The

top

level diagram is shown in Figure 10. A lower level block diagram is not

available because it contains too many low-level components

(primarily

registers) and
(53)

ADDl.^KS.uLTtj'.*21 c^O-fCHlNI3iai

sta*-

rj>-SUB1.RESULT131i!3 SuB2.,.RE5Ui.7i3la

?>jy:i..RESULT'3! JZ|

T. *E.SU-7t3!81 x*r3!;?i x.H3l>81 X2C21:8. OADDlAt:ji .a) OAHJ1Bl3l=ai -ODO -t>&SOS? 4J*^Su8lAt31tai OsuBib;^: 31 -^:>wH2A;21 -21 Os-^;'::i:-*8J f3>s.u:i:tA:a: a;

^g^P*&Jl&4:' . '&>.

f^T*I-MXnJB>-334l(31 z: HD^'^i A.j.^ ;>>_BC!3l:SI

OtI.A0aR_SP_IN3131>B!

PTgi-ADaa jy-m!3i-0! PTg X-AICT JP 3>ft131Bi

O"^X-ADOR-DP.JCNB1 31m

"OTKI..ADLK_0>*_I>i/*3i m ">Uli3! <|]

O

"--CJI3I:J -5*T_C2:31'81

4S>T-X35 3!:ii!;

<0*..r^!31'Bi

f^1"-CSi3i-a.

4Q>Tj:8f3t

k-|">T.X?!3! :2;

f^"..;;xf3l ;!

f^TJjriSI =81

>0"_D2t3I^l

-0/;i3t <bi

[image:53.499.70.430.42.435.2]

0/tni:^. n -gi

Figure 10:

Top

Level DiagramofTrilinear Interpolation TRI CTRLBlock

The area ofthisblock is 18,148 cells. The maximum path

delay

through this block is
(54)

6.4.1.2.2

TRI

ADDR BLOCK

(triaddrblock.vhd)

This block defines the datapath forcalculating thememory address of adesiredvalue in

memory. The dataneeded frommemory arethe lattice points ofthe cell containing the

inputpoint andtheircorrespondingpointsinthe output. These latticepoints are located

by

indices which were determined

by

searching the input space forthe appropriate cell

containing the input point. These indices are input to this block which computes the

memoryaddressofthedesiredvalues.

The

top

level diagram showing input and output signals ofthis block can be found in

Figure 1 1. Alowlevel block diagramisshownin Figure 12.

INZt3l"81

D*-!N3!3I'0)

O-33N5131-81 O"

1NBI3I :'B'CJ>"

"C^r 131-01

T'" *>

[image:54.499.141.362.329.480.2]
(55)
[image:55.499.40.459.35.404.2]

Figure 12: Datapath ArchitectureofTRI ADDRJBLOCK

The area ofthis block is 20,014 cells. The maximum path

delay

through this block is

38.87nanoseconds.

6.4.1.2.3 TRI CX

BLOCK

(tri_cx_block.vhd)

This block defines the datapath for calculating the CI through C7 values (henceforth

knownas CX values) usedto calculate the output. These values are calculatedfromthe

(56)

isneededto compute all seven valuesbecausethese units are sharedduetopipeliningof

operations

by

thecontrolblock.

The

top

level diagram showing input and output signals ofthis block can be found in

Figure 13. Alow level block

diagram

isshownin Figure 14.

Ar;31A[31-'B3

a::::ibi2: -ei

f^>-5l.;B1Ani:g)

SUB2A[3! <gj

sugzB'3* -b'i

&J33AI'J 1 :ii

St63Bl3t s^J

A:.:-Dl_??rSULTl31rgI

5u3! ??fcSJLTtjt=81

SUB2_RESULT{31 ^81

-OstS3_SESULTt31-2!

[image:56.499.117.380.206.343.2]

TI_CX_3_

[image:56.499.36.461.421.631.2]

Figure 13:

Top

Level DiagramofTrilinear Interpolation TRI CX BLOCK
(57)

The area ofthis block is 3,658 cells. The maximum path

delay

through this block is

10.36nanoseconds.

6.4.1.2.4 TRI TERM BLOCK

(tritermblock.vhd)

Thisblockdefinesthe datapathfor calculatingtheinterpolatedvalues; it istheworkhorse

ofthe interpolation module. It uses the CX values produced

by

the TRICXBLOCK

and several otherinputsprovided

by

thecontrolblocktocomputetheinterpolatedvalues.

The

top

level diagram showing input and output signals ofthis block can be found in

Figure 15. A low level block diagram isshowninFigure 16. Thedisadvantageofusing

scaleddata canbe seen in this low level

diagram;

there are divide units throughout the

datapath that are needed to scale down theresults ofthe multiplication-rich calculation

performed

by

this block. These divides units add significant area and delays to th

Figure

Figure 1: The Xerographic Printing Process
Figure 2: Axes of the CIELAB color space.
Figure 5: Sub-cube with Lattice Points Determined During Extraction
Figure 6: ID Linear Interpolation
+7

References

Related documents