Rochester Institute of Technology
RIT Scholar Works
Theses
Thesis/Dissertation Collections
1990
Reasoning with visual knowledge in an object
recognition system
Christine Wojnowski
Follow this and additional works at:
http://scholarworks.rit.edu/theses
This Thesis is brought to you for free and open access by the Thesis/Dissertation Collections at RIT Scholar Works. It has been accepted for inclusion
in Theses by an authorized administrator of RIT Scholar Works. For more information, please contact
Recommended Citation
Rochester Institute of Technology
School of Computer Science and Technology
Reasoning with Visual Knowledge in an
Object Recognition System
by
Christine Wojnowski
A thesis, submitted to
The Faculty of the School of Computer Science and Technology,
in partial fulfillment of the requirements for the degree of
Master of Science in Computer Science
Approved by:
Professor J. A. Biles
Dr. H. Rhody
Dr. P. Anderson
Title of Thesis:
_L-"i~~.PJ.
...
AdbuIi::::U....Ilo4.""".J.
'.lJf~::""-"'''''14...u-t::Ha.L.lo..:1....D'--...I:1;j~/-='S1~1 oI6.a..c..L~--'k....J..L.I"
...
r.7-1dod-'=J<..o""'--'~9-'(
_
_
~--'='-''''---<:::(k''''-'',...=----lo.oQ~b:=t.;~i\'l-;....I.c::::t..:1...-_Qo
A'
(fl
/6
'OK.?
74iL.llJJ)-e~.KyJd..t:!=~----I,
, prefer to be contacted each time a request for a
ABSTRACT
The impact
of artificialintelligence
on computer visionhas
provided variousperspectives and approaches
to
solving
problems ofthe
human
visual system.Some
ofthe
symbolicprocessing
andknowledge-based
techniques
implemented
in
visionsystems represent a meaningful extensionto the
low-level,
algorithmicprocessing
whichhas been
emphasized sincethe
advent ofthe
computer visionfield.
The higher-level
processes attemptto
capturethe
essence of visualcognition,
specifically
by
encompassing
a model ofthe
visual world andthe
reasoning
processesthat
manipulatethis
stored visualknowledge
and environmental cues.This
thesis
includes
adiscussion
ofexisting
computer visionsystems surveyed
from
ahigh-level
perspective.The
goal ofthis thesis
is
to
develop
ahigh-level inference
systemthat
implements reasoning
processes and utilizes a visualmemory
modelto
achieve object recognitionin
a specificdomain.
The
focus is
onsymbolically representing
andreasoning
withhigh-level
knowledge
using
aframe-based
approach.The
organization andstructuring
ofdomain
knowledge,
reasoning
processes and control and search strategies areemphasized.
The implementation
utilizesaframe
package writtenin Prolog.
Computing
Review
Codes:
1.2
Artificial
Intelligence
1.2.4
Knowledge
Representation
Formalisms
andMethods
1.2.8
Problem
Solving
Control
Methods
andSearch
1.2.10
Vision
andScene
Understanding
TABLE
OF CONTENTS
1.
Introduction
1
2.
Background
3
2.1
Computer
Vision
andthe
Human Visual System
3
2.2
Early
Computer
Vision
Systems
6
2.3
Existing
Computer Vision
Systems from
aHigh-Level Perspective
9
2.3.1
VISIONS
9
2.3.2
A
Query-Based
System
14
2.3.3
Schema-Based Systems
16
2.3.4
A Constraint Propagation Approach
18
2.3.5
SIGMA
20
2.3.6
Evidential
Reasoning
in
anObject
Recognition
System
22
2.3.7
A TMS Approach
to
Computer Vision
24
3.
Conceptual Issues
in
High-Level Vision
27
3.1
Visual
Knowledge Representation
andOrganization
27
3.2
Reasoning
Issues
ofVisual Interpretation
30
3.2.1
Control
andSearch
Strategies
30
4.
Implementation
36
4. 1
Conceptual Overview
ofHigh-Level Inference System
36
4.1.1
Knowlege Base
Design
36
4.1.2
The Process
ofObject
Recognition
38
4.2
Functional Specification
44
4.3
System Modules
47
5.
Results
51
5.1
Example Session
withthe
High-Level
Inference
System
51
5.2
Limitations
ofthe
High-Level Inference System
55
5.3
Test
Plan
58
5.4
Analysis
ofResults
60
Conclusion
65
Bibliography
Appendix A
Files from Example Session
Appendix B
Summary
ofResults
CHAPTER 1
INTRODUCTION
The integration
of artificialintelligence
and computer visionhas
evolvedin
the
pastfew
decades,
resulting in
significantadvancesin
the
computer visionfield
and,
consequently,
morecapable vision systems.
The
impact
of artificialintelligence
on computer visionhas
providedvarious perspectives and approaches
to
solving
problems ofthe
human
visual system.Some
ofthe
symbolicprocessing
andknowledge-based
techniques
implemented
in
visionsystems representa meaningful extension of
the
low-level,
algorithmicprocessing,
whichhas been
emphasized sincethe
advent ofthe
computer visionfield. The
higher-level
processes attemptto
capturethe
essenceof
human
visualcognition,
specifically
by
encompassing
a model ofthe
visual world andthe
reasoning
processes thatmanipulatethis
stored visualknowledge
and environmental cues.The
goal ofthis thesis
is
to
develop
ahigh-level
inference
system with afocus
onsymbolically
representing
andreasoning
with a visualmemory
model.The
intent is
to
develop
the
high-level
portion of asystem,
whose goalis
object recognitionin
a specificdomain,
by
emphasizing
the
organization andstructuring
ofdomain
knowledge,
reasoning
processes,
andcontrol and search strategies.
The basic
premise ofthe
problem-solving
approach usedin
this
thesis
is
that
perceptionis
a processwhereby
hypotheses
are created about whatis
visualized andinformation
is
gatheredto
support or refutethese
hypotheses.
Chapter
Two
ofthis thesis
provides a general overview ofthe
aspects ofhuman
visionthat
relateto
high-level
computervisionandbriefly
reviewsearly
computer vision work.Also,
a
high-level
perspective.In
Chapter
Three,
the
conceptualissues
ofreasoning
with visualknowledge
are exploredin
terms
ofthe
systems surveyedin
Chapter
Two
withthe
objective ofestablishing
the
foundation for
the
implementation
presentedin
this thesis.
A
conceptualoverview of
the
method presentedin
this thesis
andthe
details
ofthe
implementation
areelaborated
in Chapter Four.
An
analysis ofthe
resultsofthe
implementation,
an example session withthehigh-level inference
system andtest
plan arediscussed in
Chapter
Five,
followed
by
the
Conclusion.
Appendix A
consists ofthefiles
createdduring
processing
ofthe
example sessionwith
the
high-level inference
system thatis discussed in Chapter Five.
A summary
ofthe
identification accuracy
ofthe test
cases and asummary
of results areincluded in
Appendix B.
Appendix C
consists of a guideto
using
the
high-level inference
system,
andAppendix D
includes
the
input
and outputfiles
ofthe test
cases referencedin Appendix
B,
plusthe
knowledge
CHAPTER 2
BACKGROUND
2.1
Computer Vision
andthe
Human Visual System
Evidence for
the
existence ofthe
capacity for
the
human
visualprocessto
reason aboutthe world,
apparently
subconsciously,
canbe
found
whenthe
human
visualsystemis
presentedwith a visual
illusion,
an occluded object within a particularcontext,
or an unfamiliar scene[Fisc87].
Unknowingly,
the
complexhuman
visualsystem generatesmultiple,
competing
objector scene
hypotheses
andeffortlessly
selectsthe
most appropriate one afterhaving knowledgeably
eliminated
the
alternatives,
thus
demonstrating
the
ability
to
"visually
perceive."In sequencing
the
stepsthehuman
visual system undergoeswhenit
attemptsto
interpret
an unfamiliarscene,
the
visual systemmay
alternatebetween
more than one possibleinterpretation
withthe
objectiveof
arriving
at a consistentone,
thus
suggesting
the
existence of anability
to
reason aboutwhatis
being
observed.For
instance,
in
describing
the
picturedepicted in Figure
2-1,
the
human
visual system
may
concludethat
a white squareis occluding
four
dark
circles[Fisc87,
229].
This,
in
turn,
contrastswiththe
interpretation
that the
whitesquaredoes
notactually
exist,
but
instead is
anillusion
(since it is
thesameintensity
asthe
background).
Reasoning
aboutthe
context of a scenethat
is distorted
or providesonly
partialinformation
to the
human
visual system suggests adependency
on a priorfamiliarity
withthe
specific context
in
orderto
identify
an object orinterpret
a scene accurately.In addition,
the
context-free cues obtained
from
animage
areusually
not sufficientto
interpret
the
image,
andor
interpret
a scene.This
vast amount ofknowledge is
acquired through experiences andapparently is
storedcompactly in
the
brain
to
be
accessedultimately
by
the
human
visual system.As Neisser asserts, "Visual
cognition,
then,
deals
withthe
processesby
which aperceived,
remembered,
andthought-about
worldis brought into
being
from
asunpromising
abeginning
asthe
retinalpatterns"
[Neis67,
4].
Figure
2-1 Subjective Contours
(Fischler,
M. A.
andFirschein,
O.,
Intelligence The
Eye.
the
Brain
andthe
Computer,
(c)
1987
by Addison-Wesley Publishing
Co.
Reprinted
by
permission of
Addison-Wesley Publishing
Co., Inc., Reading, MA)
Understanding
the
underlying
complexity
ofthe
human
visual system canbe
crucialin
providing
insight into successfully
developing
a computer vision system.However,
the emphasisplaced on such an
understanding
does
not reflect a general consensusamong
computer visionresearchers
[Lowe85,
7]
.The
human
visualsystemnormally
operateswith somedegree
oferror,
taking
in
varying
amounts of evidence.It is
afinely
tuned
systemthat
processes muchinformation
andtakes
into
considerationinformation
derived from
whatis
being
perceived,
however
incomplete,
along
withtheexpectations oftheobserver[Barr81].
Unfortunately,
it is
difficult
to
reflect on one's ownvisualsystem,
andhumans
often exhibit aninherent
trust
in
their
ability
to
perceive.This
in
turn
sometimes contributesto
alimited understanding
ofthe
human
VISUAL
MEMORY
(MODELS)
[MAGE
INPUT
PRE
PROCESSING
PRIMITIVE DESCRIPTIONS
(FEATURE
EXTRACTION)
ACQUIRE MODELS
SYMBOLIC
DESCRIPTIONS
ACCESS MODELS
SEMANTIC INTERPRETA
TION
NEW DESCRIPTIONS
NEW FEATURES
OUTPUT
DESCRIPTIONS!
Figure 2-2 Image
Understanding
System ("Characterization
andRequirements
ofComputer
Vision
Systems,"R.
Nevatia,
1978,
In
Computer
Vision
Systems.
Hanson
A.
R., Riseman,
E.
M.,
eds.(c)
1978
by
Academic
Press. Reprinted
by
permission.)
An
artificial visionsystemattemptsto
modelhuman
visualperceptionandcognition,
andit is
concernedprimarily
withproviding
a computer withthe
capabilities andintelligence
ofthe
human
visual system.Modeling
the
human
visual systemby developing
ageneral-purpose,
domain-independent
computer vision systemhas
provedto
be
adifficult
task
[Hans78a].
Knowledge-based
visionsystems,
as aresult,
have
been
developed
that
employ knowledge-based
techniques,
operate within a particulardomain
ofknowledge,
and aredesigned for
a specifictask.
arrays of values or a
line drawing.
The
output canbe
a symbolicdescription,
a semanticinterpretation,
or some other system-specificrepresentation.A
computer vision system canbe
partitionednaturally into
sequentiallevels
that
reflectthe
level
ofprocessing
and representationofknowledge (see
Figure
2-2)
[Barr81].
Low-level
(early)
processing includes
theprocessesthat
operate onthe
numericalarrays ofsensory
data,
andis
concerned withextracting information
directly
from
the
image.
It is
typified
by
data-driven
or
bottom-up
processing.High-level
(late
orcognitive) processing
provides the symbolicdescription
or semanticinterpretation
of ascene,
giventhe
information
generatedby
the
lower
levels,
andfocuses
onmodeling
one's expectations and specificdomain knowledge.
This
is
accomplished
by
the
selection and appropriateutilizationof suitableknowledge
representations,
effective object recognition
strategies,
and theimplementation
of control strategiesto
direct
theprocess of
obtaining
a symbolicdescription
of a scene.High-level
visionis
characterizedby
goal-directed
(top-down,
predictive)
analysis andthe
use ofsemantic models and relationships.Knowledge
aboutthedomain in
which a sceneis
obtainedis
usedextensively
throughout this
stage.
Intermediate-level processing
seeksto
bridge
the
gap
between
the
low
andhigh levels
andis
responsiblefor converting
the
resultsobtainedfrom
the
low level
into relatively
more abstractrepresentations
that
are requiredby
subsequentlevels
(see
Figure
2-3).
2.2
Early
Computer
Vision
Systems
The development
of computer vision systemsdates back
to
the1960's
whenL.
G.
Roberts's
programtook
animage
of a polyhedralblock
scene,
convertedit into
aline
drawing,
and recognized
the
image based
onthree-dimensional models of acube,
wedge andhexagonal
prism
[Robe65,
Barr81].
Processing
in
Roberts's
program wassequential,
with segmentationLow-level
Intermediate
High-level
Analysis
level
analysis analysisDeterniiiiation
Determination
o
of
local image
of genericproperties scene attributes
-
/\
Sensor
Smoothing
N Threshnlr|
ing
/ \
Edge
Elements
Regions
6
b
Color
Surfaces
Texture
Objects
Description
of scene
Signal
.Figure
2-3 Signals-to-Symbols
Paradigm
(Fischler,
M.
A.
andFirschein,
O.
Intelligence
The
Eye,
theBrain
andthe
Computer
(c)
1987
by Addison-Wesley Publishing
Co.
Reprinted
by
permissionofAddison-Wesley Publishing
Co.,
Inc.
Reading,
MA)
the
segmentation processis
to
partition animage into
meaningful components such as regions.Unfortunately,
segmentationis
not alwaysentirely
reliable andtherefore,
errors areintroduced
early in
processing
animage
[Barr81,
576].
These
errorsultimately
propagateto
theinterpretation phase,
resulting in
anincorrect
explanation ofthe
image.
Another
approachthat
arose
from
the
difficulties
encounteredin early
attemptsto
interpret
animage
entailedcombining
the
segmentation andinterpretation
processes[Barr81,
576].
The
segmentation phase wasdirected further
by
the
interpretation phase,
andthe
strict sense of sequentialprocessing
wasabandoned.
Falk's
program,
INTERPRET,
handled imperfect input
orline drawings
by
employing
segmenting
theline
drawing,
recognizing
objects asinstances
ofmodels,
predicting
aline
drawing
from
the
recognizedobjects,
andundergoing
verification,
which comparesthe
originalinput
to
the
predictedline drawing. Falk
employed ahypothesize-and-test
strategy along
withalarge
setof
heuristics
to
recognize objects.Similarly,
Guzman
relied onheuristics
to
achieve theobjectives of
his
program,
SEE,
which used a symbolic approach[Guzm68].
The
goal ofSEE
was
to
partition aline
drawing
into
three-dimensional
elementsusing
heuristic
knowledge
ofvertices
to
combineregions[Guzm68].
The
input
wasa symbolic representation ofpoints,
lines
and
surfaces,
andit
assumedpreprocessing
wascompleted.The
programgenerated outputin
the
form
oflists
identifying
the
bodies in
ascene,
but
it did
nothave
the
capability
ofhandling
missing
lines.
Guzman
accomplishedhis
goal withoutthe
use ofstored object models.Early
computer vision systems were characterizedby
simple,
restricted scenedomains
in
whichmodels of objects couldbe
constructedeasily
[Barr81,
574].
According
to
Barrow
andTenenbaum,
the
blocks
worldillustrates many
aspects of visual perception[Barr81,
578].
However,
a moredifficult
task
involves working
withreal world scenesthat
convey
meaning.More
recent computer vision systemsdeal
withdomains
ofthis context,
and as aresult,
knowledge
representation andreasoning
processeshave
become
more relevantissues in
developing
such systems.In
mostearly
computer vision workthere
was a predominance of numerical andalgorithmic methods used and an emphasis on
low-level
vision.As
thefield
of computer visionevolved and more sophisticated artificial
intelligence
techniques
weredeveloped,
more capableand comprehensive vision systems came
into
existence.Specifically,
there
was anemergence ofsymbolic
processing,
explicitreasoning techniques,
attemptsto
modeldomain knowledge
efficiently,
processesdealing
withuncertainty,
and effortsto
solve problemsby
employing
computer vision expanded and related
high-level
issues
were given more attention andconsideration.
2.3
Existing
Computer Vision Systems from
aHigh-Level
Perspective
The
following
computer vision systems and research efforts willbe
discussed in
terms
of
high-level
visionissues,
whichinclude knowledge base design
orhow
visualinformation
is
represented;
the
method ofreasoning employed-top-down,
bottom-up,
or a combination ofboth;
the
controlstrategy
used;
the searchstrategy
as requiredby
the
recognitionprocess;
andthe
uncertainty
handling
mechanism,
if
any.In
otherwords,
the
structure ofthe
system'sinterpretation
scheme andproblem-solving
approach willbe
examined.The
purpose ofthe
following
sectionsis
to
provide asampling
of variouscommonly
used and some not-so-commonapproaches
to
high-level
vision.It
is
notintended
to
be
comprehensivein
scopebut,
instead,
is
intended
to
illustrate
andexemplify
the
high-level
aspects ofdeveloping
a computer visionsystem.
Computer
vision encompasses systems whose goalsinclude
objectrecognition,
image
understanding
and sceneinterpretation.
2.3.1
VISIONS
The
goal ofVISIONS,
aUniversity
ofMassachusetts
project underthe
development
ofAllen
Hanson
andEdward Riseman
sincethe
mid1970's,
is
to
generate a symbolic representationof athree-dimensional scene as portrayed
in
atwo-dimensional
image
[Hans78,
Hans88].
The
system employs
knowledge-based
techniques
for
the
interpretation
of naturalimages (house
androad
scenes)
and supportsthree
levels
of representation and processing.The
representationandcontrol strategies are
designed
to
be
modular andhierarchical,
andthe
overall structure ofthe
y-K_
HIGH
LEVEL"Symbolic DescriptionofObjeotsandScenes
Control Strategies
INFERENCE & PROPAGATION
OF BELIEF
FocusofAttention
Rule-Baaed Object
Hypotheses
Information Fusion
ObjectMatching
Information Fusion
ControlofPerceptual Organization
I
-v_INTERMEDIATE
SYMBOLIC
REPRESENTATION:
Symbolic DescriptionofRegions, lines, Surfaces
and othertokensextractedfrom the sensory data
PERCEPTUAL
ORGANIZATION
grouping, splitting,anc
deletingISS tokens
Segmentation FeatureExtraction
y
GoalOriented Resegmentation AdditionalFeat-urea Finer ResolutionLOW
LEVEL:
Pixels- ArrayB
ofIntensity.RGB.Depth
(staticmonocular, stereopairs,motionsequences)
STEREO AND MOTION TO
PRODUCE DEPTH ARRAYS
Figure
2-4
Overview
ofVISIONS ("The VISIONS Image
Understanding System",
by
Hanson,
A. R.
andRiseman,
E.
M.
In Advances in Computer Vision. Volume
1.
Brown,
Christopher,
ed.(c)
1988
Lawrence Erlbaum Associates.
Reprinted
by
permission.)
At
the
lowest
level
ofrepresentation,
operations are performed on numerical arrays ofsensory data. At
the
intermediate
level,
symbolictokens
representing regions,
lines
and surfacesare constructed
from
the
features
extracted atthe
low
level.
An intermediate
symbolicrepresentation of
the
image is
constructedusing
a segmentationprocedure,
and anincomplete
interpretation
is
then
built
by
labeling
intermediate
tokens.
Object labels
activate proceduralportions associated with
the
objectthat
is
hypothesized. The
proceduralcomponentsfurther
guideknowledge
base
representedatthe
high level is
referredto
aslong-term memory
andis
composedof asemantic network of
hierarchically
organizedschemas,
whichhave
a procedural componentincorporated
into
their
structure.The
proceduralcomponents,
which are comprised ofthe
interpretation
strategies,
allowinstances
of whatis
representedin long-term memory
to
be
constructedas a network
that
is
specifically
referredto
as short-term memory.The
interpretation
processin
VISIONS
consists offour
primary
components:representation of
knowledge,
the
processesthat
producethe
internal
model,
thecontrolstrategy
that
governsthese processes,
andthe
searchstrategy
through
long-term memory [Hans78b].
Interpretation
of animage in VISIONS
centers onbuilding
aninternal
modelthat
consists of adescription
of meaningful entities andtheir
relationshipsin
a scene.Various
types
ofinformation,
whichinclude
declarative,
relational andprocedural,
arerepresented
in
ahierarchical
structure as a semantic network of schema nodesin VISIONS.
A
schema
describes
an object andthe
relationsbetween
parts of an object.In
additionto this
descriptive
knowledge,
schemas also encode a procedural portionthat
details
the
interpretation
strategies,
or object-recognition schemes comprised ofhypothesis
and verification processes.Long-term
memory
reflectsthe
a prioriknowledge
ofthe
worldas general classes ofobjects,
andmodels of stereotypical scenes and one's expectations.
Long-term
memory
consists ofspecialization and composition
hierarchies built
by
using
IS-A
andPART-OF relations,
respectively.
There
arethree
ways a concept canbe
representedin
the
knowledge base:
as an objectthat
has
attributes associated withit,
whichareindependent
ofits
parts or thecontext of whichit may be
a partof;
as a part of aschema,
whereinformation
aboutits
relationshipsto the
larger
context
is
expressed;
and as a schemaitself,
whereits
parts and relationshipsbetween
parts areparts are
represented
as additional schemas or withinthe
schemafor
the
object[Hans88,
81].
If there
are complicated relationshipsbetween
parts of anobject,
then the object recognitionstrategy
generatesthe
hypotheses for
the
parts, otherwise,
an object partis
represented as aseparate
schema,
if recognizing
the
partis
nothighly
dependent
onthe
relationshipsbetween
the
partsof anobject.
The
object-dependentinterpretation
strategies attemptto
matchthe
expected entitiesin
a scene
to the
data
in
short-termmemory
andalso,
to
generate aninstance
of a schema.Short-term
memory,
therefore,
represents theinstances
of certain nodesin long-term
memory
andis
also referred
to
asthe
blackboard.
Short-term
memory
can alsobe
viewedas a set ofhypotheses
that
constitutes a partial model.The
interpretation
processes areimplemented
using
ablackboard
and canbe
runin
parallel on a multiprocessor with each schema
independently
accessing
the
blackboard [Hans88].
Knowledge
sources consist ofinterpretation
processes that exercise simple control and cannotdirectly
accessthe
blackboard, instead,
the
interpretation
strategies ofthe
schemasinvoke
the
knowledge
sources.Specifically,
the
interpretation
strategies canverify
or rejecthypotheses,
evaluate
the
validity
ofhypotheses,
extendhypotheses
by
combining
similardata
andhypotheses,
instantiate
schemas,
and activate schemas.It
is
the
intention
ofthedevelopers
ofVISIONS
to
devise knowledge
sources asthe
bases for interpretation
strategies,
andconsequently,
schemaswould
simply invoke
theappropriateknowledge
source.The interpretation
strategiesin
VISIONS
may
proceedin
atop-down
orbottom-up
manner.
The
various categories ofinterpretation
methodsinclude
ahypothesis
generation andextension
strategy,
astrategy
utilizing
geometricdata
and anotherstrategy
that
entailsthe
Direct
andindirect
communicationbetween
schemasis
possible withinVISIONS,
anddepends
onhow closely
objects are related.If
objects arehighly
related,
then
direct
communication
is
preferable.For
instance,
the
interpretation strategy for
ahouse halts
processing
whileit
waitsfor its
requested roof and wall(its
parts)
hypotheses,
which weredirectly
invoked.
If
objects are nothighly
related,
then their
interpretation
strategies proceedwith
their
processing
andindirectly
communicate with other schemas through theblackboard.
While
a schemais
active,
simultaneouspolling
ofthe
blackboard
for
hypotheses
can occur.As
the
schema systemis
being
revisedby
thedevelopers
ofVISIONS,
paralleldistributed
controlof schemas
is
being
examined.Another
majorissue in
the
development
of a computer vision systemthat
has been
explored
by
thedesigners
ofVISIONS is
inferencing
in
the context of uncertaininformation.
Due
to
the massive amounts ofinformation
that
needto
be
processedin constructing
aninterpretation
of animage,
the
ambiguitiesthat
existin
image
data,
andthe
fallible
low- andintermediate-level
processes,
it is inevitable
that
errorswillpropagateinsidiously
to the
high level
[Hans88,
Cohe82,
Barr81].
Therefore,
the
issues
that
needto
be
dealt
within
orderto
allowinferences
to
be
madeinclude
"the
representation ofuncertainty
withinthe
system,
the
mechanisms
by
which uncertaininformation is
combined,
andthe
mannerin
which uncertainevidence
may be
extendedthrough
inference"[Hans88,
83].
To
this
end,
the
ability
to
mergeuncertain evidence
from
various sources and theability
to
propagate confidencesthrough the
schema structure are
being
explored morethoroughly
in
VISIONS
[Hans88],
In
anearly
versionofVISIONS,
estimates ofconditionalprobabilitiesbetween
a schemaand each part were stored
in
the schemas,
and usedin
the
process ofhypothesis formation
[Hans78b]
.The
estimates werebased
on expectationsregarding
the
presence ofobjectsin
certainroad scene exists.
However,
given a roadscene,
it is
not aslikely
that
astop
signnecessarily
will
be
seen.A
dependency
graph,
or evidential-basedmodel,
has been
examined as abasis
for
reasoning
aboutimages
andrepresenting knowledge in VISIONS [Wesl82]. The
model,
referredto
asDGMES
(dependency
graphmodelsof evidentialsupport),
andthe
method ofreasoning
areconcerned with
combining
evidencefor
the
purpose ofdrawing
consistentinferences.
An
objective of
this
approachis
to
manage uncertain andinaccurate information
that
resultsin
ambiguous
interpretations.
The
approachto
reasoning
using
the
DGMES
model entailsgathering
positive or negativeevidence about
the
hypotheses
representedin
the
dependency
graph,
combining
the
evidence,
andpropagating
the
consequence of combined evidenceto
dependent hypotheses
[Wesl82,
15]. There
are
distinct
processesfor
combining
evidence anddrawing
inferences.
The
modelis
capable ofmaking
goal-directed anddata-directed
inferences
over abody
ofknowledge. Once
evidencehas
been
obtainedregarding
certaindomain
knowledge,
the
inference
engine will arrive at aconsistent
interpretation.
The fundamental
conceptsunderlying
thehigh-level processing
in
VISIONS,
as well asthe
following
systems,
have
contributedto the
conceptualfoundation
ofand,
therefore,
areintegral
to the
development
ofthe
high-level
inference
systempresentedin this
thesis.
2.3.2 A
Query-Based
System
Dana
Ballard,
Christopher Brown
andJerome Feldman
atthe
University
ofRochester
developed
acomputervisionsystemthat
specifically
respondsto
aquery
presentedto the
system[Ball78].
The
applicationdomains includes
locating
ribsin
a chestX-ray
andfinding
shipsin
aencodes
two-dimensional
domain
knowledge.
The
extentto
whichthe
system processesinformation
and conductsits
searchis determined
by
the
user's query.A sketchmap,
whichconsists of
instantiations
of the semantic networkmodel,
is
constructedin
the process ofanswering
aquery.The sketchmap
servesasthe
middlelayer
ofinformation between
the
image
data
structureandthe
model ofdomain
knowledge.
The image data
structure containsinformation
aboutthe
originalimage.
Mapping
procedures
effectively
createthe
mapping between
the
image data
structures andthe
sketchmap
by
generating
the
appropriate associations orlinks.
Executive
procedures allowfor
the
links
formed between
the
model andthe
sketchmap,
analogous, respectively,
to
long-term
andshort-term
memory in VISIONS.
In
additionto
declarative
knowledge,
the
nodesin
the
model alsocontain procedural
knowledge;
themapping
and executiveprocedures are attachedto
thenodesin
the
semantic network.Constraint
relationsbetween
nodes are encodedin
theform
oflinks
in
the semanticnetwork model.
The probability
that arelationship is true, along
withthe
value ofthe
relationship,
canbe
encodedin links
[Ball78,
273].
The
developers
ofthis
query-based systemdifferentiate between
thetype
ofknowledge
representedin
the
semantic networkthatultimately
can
be
instantiated,
versusthe type
ofknowledge
that
assistsin constructing
a sketchmap.The
former
type
ofknowledge is
representedin
the
network astemplate
nodes.These
nodes andthegeometric relations
between
them
form
a constraint network.The
nodesin
the
constraintnetworkcan
have location descriptors
which encodeknowledge
of whereto
find
an objectin
animage.
The
queriesfor
locating
a particular object are embodiedin
the
executiveprocedures,
which
in
turn
choosethe
most appropriatemapping
procedurethat
is
associated with a nodein
the
model.A mapping
procedurehas
aprecondition,
a cost estimate and anaccuracy
estimateselected and a node
is
then
searchedfor.
Basically,
it is
the
executive procedurethat
incorporates
the
controlstrategy for attaining
the
goal ofconstructing
asketchmap,
oranswering
a query.2.3.3 Schema-Based Systems
A
prominentfeature
of a vision systeminvolves
the
approachtoward
designing
a controlstrategy
to
guidetheinterpretation
process.A
problemthat
some approachesdo
notdeal
withadequately is derived from
having
to
hypothesize
a particular schemabefore
all availableknowledge
canbe
usedto
direct
the
searchfor
that
schema[Have83a,
Rao88].
The underlying
mechanisms of
the
controlstrategy in
Jay
Glicksman's
system, MISSEE
(Multiple
Information
Source
SEE),
is
thecycle of perception[Glic83].
Basically,
the three
step
cycle ofperception,
which
is illustrated in Figure
2-5,
is
a process where schemasdirect
explorationin
animage in
which objects are
sampled,
andthese
objectsin
animage,
in
turn,
modify
theoriginal schema[Neis76,
21].
Object
(available
information)
Modifies
Schema
/
Directs\
ExplorationFigure 2-5 The
Perceptual
Cycle (From
Cognition
andReality: Principles
andImplications
MISSEE is
aschema-basedsystemthat
exploitsvarioussourcesofinformation
and whosegoal
is
to
interpret
aerial photographs of urban scenes.The
purpose ofintegrating
multipleknowledge
sources asinput is
to
resolveambiguitiesthat
surfaceduring
the
interpretation
process.The
sources ofinput in MISSEE include
the
digitized
image,
asketchmap
andinformation
provided
by
the
user.MISSEE
alsocontainsasemantic network of schema nodes.The
schemashave
proceduralcomponents,
which are responsiblefor
instantiating
schemas,
top-down
invocation
ofschemas,
andbottom-up
processing
ofdata.
Schemas
communicatethrough
message
passing,
controlthe
interpretation
process and constructthe
final
representation.The
schemas are generated and usedby
MAIDS,
a schema manipulation system[Glic83,
Glic82]
.The
four
types
of attributes associatedwithschemas areLINK, VALUE,
PROCEDURE
and
CONFIDENCE.
Relationships between
schemas aredenoted
by
LINKs,
which canform
aspecialization
hierarchy,
adecomposition
hierarchy,
andinstances
of general classes.A
VALUE
attribute allows a place
for
a valueto
be
stored,
andif necessary,
it
can accommodatedefault
values.
When
the
valueis
added, removed,
needed orchanged,
attachedfunctions
canbe
activated.
The PROCEDURE
attribute allows an attached procedureto
be invoked
during
message passing.
The
CONFIDENCE
attribute assistsin
the
evaluation ofinterpretations. This
value can
be
modifiedby
an attached procedure when additional evidenceis found for
theobject.A
schema-basedsystem, according
to
Glicksman,
couldbe
aremedy
to
thedifficulties
a vision system
faces from
ambiguity,
whichoccurs whendata
canhave
multipleinterpretations;
incomplete
data,
orinsufficient
supportfor
aninterpretation;
andinconsistency,
whichis
aconsequence of
existing favorable
and unfavorable evidencefor
aninterpretation
[Glic82,
33].
Some
ofthe
effortsin
developing
computervisionsystemshave
been
directed
toward
andHave83b].
A
majorimpetus behind
these
effortshas been
theinadequacies
offormer
systemsand
the
difficulties
that
arisein
interpreting
animage due
to
incomplete
andinconsistent data.
The
objectiveofMapsee2,
a revision ofMapseel
developed
by
Alan Mackworth
andWilliam
Havens,
is
to
interpret
sketch maps[Mack81]. It is
a schema-based systemthat
employsdata-driven
(bottom-up)
and model-driven(top-down)
interpretation
strategies,
and emphasizesdecomposition
andspecialization relationshipsin its
network of schema nodes.A
schema modelrepresentsageneral class ofobjectswith attributes of class members and relationships
between
the
class and other schemas encoded withinit.
An instance
of a class of objectsis
created whenan object
is hypothesized.
An
objectis
recognizedby
recursively recognizing its
components,
thereby satisfying its
constraints[Have83b,
93].
Recognition,
therefore, is
basically
a search ofthe
compositionhierarchy
whichis
controlledby
the
procedures attachedto
the
schemas.After
a schema
has been
instantiated,
it
canbe
used as a cuefor
hypothesizing
a schema aboveit
(bottom-up
search)
or as a cuefor
invoking
a schemabelow it
(top-down)
in the
compositionhierarchy.
2.3.4
A Constraint
Propagation
Approach
In
additionto
model-drivenanddata-driven
methodsthat are employedduring
the
processof object recognition
to
obtain a suitable symbolicdescription
ofthe
input,
a constraint-basedapproach
has
alsobeen
pursued.Havens
proposes a constraint propagation approachthat
employs network
consistency
techniques
and utilizeshierarchically
structured schemas as aknowledge
representationformalism [Have85].
A motivating factor behind Havens's
efforts wasorganizing knowledge
sothat
it
canbe
effectively
representedandefficiently
appliedto
recognition[Have85,
127].
Havens
emphasizesschema
labeling
is
to
generate a networkconsistency
graph which consists of schemainstantiations
and relevant constraints that representthe
objectsidentified in
theinput.
The
knowledge base
consistsofschemasthat
representgeneral classes of objects andexplicitly include
relationships withother classes.
Composition
and specialization relationshipsbetween
schemasare used
to
form
constraintsbetween
associatedinstantiations in
the
networkconsistency
graph.The
validcombinationsof partsfor
a class of objects are capturedin
composition rules which areassociated witha schema.
An
object canbe
recognizedby
recognizing its
components sothat
constraints
for
the
class are satisfied.The
composition constraints canform
a compositionhierarchy
while a specializationhierarchy
is
representedby
specifying
subclasses ordescendants
of schemas.As
a result offorming
the
specializationhierarchy,
attributes and constraints areinherited
by
subclasses.A
labelset is
associated with a classthat
representsits
possiblelabels
ortypes.
The
constraints arepropagated
ultimately
throughthe
networkconsistency
graphby
the
implementation
of networkconsistency
algorithmsin
orderto
resolveambiguities[Have85].
Schemas
have
afairly
uniformstructure,
eachconsisting
of a set of compositionrules,
a set of
components,
a set of constraints and alabelset for
the
schema.The
terminal
nodes ofthe
specializationhierarchy
are referredto
as atomic classes whichhave
no compositionand arenot represented as schemas
in
the
knowledge base.
The knowledge
base,
also, does
not containany information
aboutinstances
ofobject classes.The
networkconsistency
graphis
adescription
that
is
constructedfrom
the
knowledge
base.
Inconsistent labels
that
do
notcomply
withthe
constraints are eliminated.As
the
compositionrules are
invoked,
new schemainstances
are created and new constraintsamong
the
instances
are established.The
process ofcomposing
anetworkconsistency
graphis driven
by
description.
Integrated
withinthe
compositionrules arealgorithmsfor maintaining consistency
and
propagating
constraintsin
the
network.2.3.5 SIGMA
In
computer visionsystemsthat
integrate both
bottom-up
andtop-down
searchstrategiessequentially,
adecision
mustbe
made aboutwhich methodto
use and whento
usethat
method.In
regardto this
predicament,
Larry
Davis
andS. S. Vincent
Hwang
developed
SIGMA
withthe
objective of
constructing
aflexible
reasoning
processthat
does
notrely
on a vast amount ofdomain
controlknowledge
[Hwan86,
Davi85].
SIGMA
is
animage
understanding
systemthat
consists of
three
components: alow-level
visionsystem,
ahigh-level
visionsystem and aquery-answering
module.An
image is
providedto
the
low-level
system whereit
undergoessegmentation,
and the results ofthis process are storedin
adatabase.
Models
of objects areprovided
to
SIGMA
by
theuser andthe
high-level
vision system usesthese
modelsto
interpret
image
structures orto
guide thelow-level
systemto
locate image
structures.The high-level
system
gradually builds
aninterpretation
ofthe
image
by
using
theobject modelsto
interpret
the
information
storedin
the
database.
A
goalis
providedto
thequery-answering
module andthe
process of
building
aninterpretation
continues untilthe
goalis
attainedorfurther interpretations
are not possible.
Among
the
many
possibleinterpretations
that
may be
generated,
SIGMA
usesthe
query-answering
moduleto
assistin
the
selection ofa suitable or adequateinterpretation.
The
high-level
system activatesthis
moduleto
matchthe
initial
goal withthe
interpretation
made sofar.
If
the
goalis
satisfied, the
high-level
system continues withthe
interpretation
process.The
developers
ofSIGMA
stressthe
integration
of relatedhypotheses
which underliesthe
processing
includes
the
creationofhypotheses
ofobjects,
thegrouping
ofrelatedhypotheses,
thegenerationof a
composite
hypothesis followed
by
the
verification ofhypotheses
[Hwan86,
325].
The
knowledge
representation schemeusedto
build
object modelsin SIGMA
consists offrames
andrelationsbetween
objects encodedin
rulesandlinks,
producing
a graph of nodes andarcs.
The
slotsin
aframe
storevalues ofassociated attributesof anobjectandcan, also,
house
procedural
knowledge
ofhow
to
computethe
value of a slot.The
frames
are organizedinto
aspecialization
hierarchy
where ancestral properties areinherited
by
descendants.
Instances
offrames
arekept in
the
database
and canbe
categorized as verified orhypothetical.
Associated
with each
instance is
a strengththat
is
computed whenthe
instantiation
occurred.The
value ofthe
strength of aninstance is
recomputedif
the
instance is
updated.Relational knowledge is carefully
encodedin
SIGMA
to
be
usedby
the
high-level
systemto
verify
that
the
relationholds
between
two
objects orto
locate
an objectthat
is
constrainedby
a
relationship
to
another object.Control
knowledge,
which candescribe
a relationbetween
two
objects,
is
embedded within rules.In
otherwords,
a relationbetween
a pair of objects canbe
expressed as a rule
in
the
frame representing
one ofthe
objects.A
rulehas
three
components:acontrol condition whichspecifieswhena rule can
be
used,
ahypothesis
that
describes
the
objectto
be
created whenthe
conditionis
true,
and an action thatis invoked if
the
hypothesis is
verified.
The
high-level
system exploitsdifferent
parts ofa ruleto
guideits
analysis.Once image
structures are extractedfrom
the
image,
bottom-up
analysis cantake
placeusing
relationalknowledge
to
verify
constraintsbetween
objects.Top-down
analysisis
usedto
look for image
structuresoriginally
missedby
the segmentation processby
using
relationsto
direct
the
analysis asto
whereto
look for
the
missing object,
andto
generatehypotheses.
Instantiations,
ordatabase
entities,
are composed of aniconic description
and asymbolicobject's
frame may be found.
The
symbolicdescription
specifiesthe
values ofthe
slot of theinstantiated
frame
and associated constraints.If
two
database
entitiessatisfy
specificconditions,
then
they
areconsideredconsistent andconsequently,
these
related entitiescanbe integrated. A
situation,
in
turn,
consists of consistententities,
and a compositehypothesis
takes
into
consideration all of
the
constraints onthe
slots ofthe
database
entitiesin
a particular situation.It
is
the
focus
of attention mechanism withinthehigh-level
systemthat
selects a situationthat
has
the
greatest strength and sendsit
to the
compositehypothesis
constructor.The
solutiongenerator,
then,
eitherlocates
aninstance in
the
database
that
satisfies thecompositehypothesis,
or sendsthe
descriptions
ofthe
compositehypothesis
to
thelow-level
systemto
perform resegmentation.The
interpretation
process cyclesthrough
a number ofiterations
until an adequateinterpretation
is
produced.At
theend of eachiteration,
the
query-answering
moduleis
invoked
which enables
descriptions
ofthe
interpretation
to
be displayed
if
a goalhas
been
achieved.2.3.6
Evidential
Reasoning
in
anObject
Recognition System
A
majorissue
in
high-level
visionthatis
unavoidableanddemands
considerable attentionis
that
ofuncertainty handling. It
is
inevitable
that
thedevelopment
of aninferencing
modulefor
a computer vision system will
be influenced
by
how
uncertainty
leading
to
inconsistent
orunreliable
interpretations
is handled.
According
to
Rao
andJain:
"Evidential information
refersto
information
that
is
uncertain,
imprecise,
andinaccurate.
Knowledge-based
systemsoperating
in
complex environments shouldbe
ableto
handle
evidentialinformation"
[Rao88,
77].
A
computervision system
is faced
withthe
problem ofprocessing
muchinformation in
a complexenvironment,
some ofwhichis
incomplete,
inaccurate
and/or uncertain[Cohe82,
Rao88]. While
some systems
handle uncertainty
in
an adhoc
manner,
others emphasize explicituncertainty
Olin
illustrates
how
evidentialreasoning is
appliedin
drawing
inferences in
afairly
formal
manner
[Kim84].
The knowledge
representationschemeutilizedin
the
object recognition systemis
object-oriented.
The
objects are representedin frames
and a specializationhierarchy
is formed
whichexhibits
property inheritance. The domain knowledge is kept
separatefrom
the
inference
engine.Symbolic
representationsofregions,
line,
edges and vertices are extractedfrom
the
image
andused
in
the
process ofhypothesis
generation,
confidence evaluation and confirmation.Initial
hypotheses
are a result ofbottom-up
processing
ofthe
image
and subsequenthypotheses
are aresult of
top-down
processing,
which can acquire additional evidencefrom
the
image.
As
hypotheses,
which are represented asinstances
offrames,
aregenerated,
the
inferencing
processwill
look
for
evidenceto
support or refutethese
hypotheses.
Additional
hypotheses may be
created and new evidence
may
refineinitial hypotheses.
Specifically,
the
knowledge
base includes domain-specific
rulesandheuristics for
objectrecognition while
the
inference
engineinterprets
the
rules andheuristics.
The
classificationreasoning
process ofthe
systemis
modeled afterhuman intelligent
classificationreasoning
[Kim84].
Kim
et al. emphasizethat this
aspect oftheir
systemis based
onthe
observationthat
humans describe
objectsin
terms
ofhow
muchinformation
is
available[Kim84].
Thus,
anobjecthierarchy
is
well-suitedto this
applicationbecause
objects aredescribed
atdifferent levels
ofdetail.
A
confidencefactor,
represented as alikelihood
of ahypothesis
to
be
true,
is
associatedwitha
hypothesis.
After
ahypothesis
is
created,
it
undergoes a processto
confirmor refuteits
class membership.
This
entailsgathering
of relevant evidence andupdating
the
confidencefactor.
The frames
representing
objects encode rules thatdescribe
conditions about attributes ofthe
in
supportof ahypothesis
is kept
separatefrom refuting
evidence.Positive
evidencefor
a classimplies
supportfor the
ancestors ofthe
class whilenegative evidenceimplies
alack
of supportfor
descendants
of the class.The
confidence of aninstance
of a superclassis
greaterthan the
confidence of a subclass
instance.
The
proceduralknowledge
within aframe
consists ofprocedures
to
look for
evidence and rulesto
assessthe
level
of confidence.Kim
proposes adistributed
parallel modelin
an elaboration of adistributed inference
scheme
for
classification where anobjectclasshierarchy
is
employedalong
with acertainty
factor
formalism [Kim85].
In
this
model,
processorscorresponding
to
objectsin
the
hierarchy
passmessages
regarding
the
effect of accumulated evidenceto
neighboring
objects.A
processoroversees
the
gathering
andcombining
of evidencefor
a particularhypothesis.
Evidence for
ahypothesis
canbe
obtainedfrom local
rules,
superclasses andsubclasses,
and
integrated
to
obtain a global confidence value.The
refuting
evidence gatheredfrom
higher
classes
is
propagatedto
the
hypothesized
object'sdescendants,
while supportive evidencecombined
from
the
object'sdescendants
is
propagatedto
its
superclassby
the
processor.The
global confidence
is
representedby
a numberin
the
range of-1 and+
1.
Once
the
confidenceof each
hypothesis is
known,
the
object canbe
classifiedby
beginning
atthe
root node ofthe
object
hierarchy
andselecting
the
nodes of associated subclasses withthe
greatest confidencevalue.
2.3.7 A
TMS Approach to Computer Vision
A
truth
maintenance system(TMS)
represents another approachto
maintaining
consistency
andminimizing
the
search processin
computer vision.The
object recognitionAccording
to
Johan de
Kleer,
truth
maintenance systemsdistinguish between
aproblem-solving
modulethat
embodiesthe
domain-specific
rules and makesinferences,
and atruth
maintenance module
that
is
responsiblefor recording inferences
orthe
state ofthe
search process[deK186]. The functions
of aTMS include maintaining
a record of allinferences
made, enabling
nonmonotonic
inferences
to
be
made andremoving
contradictions[deK186,
129].
A
justification-based TMS
(JTMS)
recordsalljustifications,
ordependencies,
andfinds
solutions one at atime.
An
assumption-basedTMS
(ATMS)
enables solutionsto
be found
simultaneously.The benefits
derived from using
aTMS include maintaining consistency
andconducting
more efficientsearches.
VICTORS has
been
implemented
using
a scene ofoverlapping
rectangles andhas
identified
allincidents
of aknown
figure
withinthe
scene[PROV88,
230].
Multiple
interpretations
ofthe
figure
are possiblein
the
scene.The internal
representationofdependencies
is
adependency
graph where antecedents and consequents ofdependency
relations are specified.Although VICTORS
has
been
implemented
with ajustification-based
and an assumption-basedTMS,
according
to
Provan,
an objectivein
developing
the
system wasto
detect
multipleinterpretations
atthe
sametime, therefore,
animplicit
preferencetoward
anATMS
is
expressed[Prov88].
The
structure ofVICTORS
is divided into
a general constraint engine and a generalreasoning
engine.The
constraint engine generates constraints andthe
reasoning
engineis
responsible
for
preserving
consistency.The
task
ofthe
TMS
is
to
remove multiple orcontradictory
labels.
The
powerof anATMS
lies in its ability
to
locate
solutionssimultaneously using
breadth-first
search,
andby
refraining from
exploration of portions ofthe
search spaceknown
to
be
scene,
Provan
illustrates
that
aTMS
canbe effectively
employedin
avisionsystem,
but
assertsthat the
performanceofthe
ATMS
slows with a greater amount ofinterpretations
[Prov88,
235].
Although
the softwaretools
usedto
implement
the
high-level
inference
systemin
this
thesis
do
notinclude
aTMS
utility,
the
basic
concepts ofmaintaining consistency
andminimizing
CHAPTER 3
CONCEPTUAL ISSUES
IN HIGH-LEVEL VISION
Although
there exists aseemingly
wide range of approachesto
dealing
effectively
withthe
high-level issues
in
a computervisionsystem,
the
unifying
theme
ofthepreviously discussed
systems can
be
reducedto
modeling
ahuman
reasoning
process.This,
in
turn, naturally leads
to the
question of whatto
reason with(in
a visualdomain),
hence,
knowledge
representationbecomes
another majorissue.
Reasoning
in
a visual environmentfurther
constitutesmechanismsfor
inferencing
andcontrolling
the
searchstrategy
as requiredby
the
recognitionprocess,
andhandling
uncertainty
which arisesdue
to
the nature of the visual environment.The
surveyedsystems
basically
areattempting
to
solve similar problemsin
ways that are restrictedby
the
specific application
domain
andthe
objectives ofthe
particularsystemmany
visionsystems aredesigned for
a special purpose.The
following
discussion
provides an analysis ofthe
underlying
conceptual
basis
of variousapproaches surveyedin
this thesis
and establishesthegroundworkfor
the
implementation
developed in
this thesis.3.1
Visual Knowledge Representation
andOrganization
A
computervisionsystemis continually
confronted withprocessing
animmense
amountof
information.
The
process of visualinterpretation,
specifically,
is
concerned withproviding
a symbolic
description
ofa scene andtherefore,
relies on models ofexpectations,
prototypicala
scene,
for
example,
they
areapplying
their
expectations ofthe
situation,
along
with physicalevidence
to
their storeofknowledge
to
reach afinal
conclusion.The issues
that
knowledge-based
vision systems are confrontedwith, according
to
Rao
and
Jain,
include
defining
whatknowledge is
relevantto the
problem,
deciding
how
the
knowledge is
to
be
represented,
anddetermining
how
and whenthe
knowledge is
to
be
used[Rao88,
66].
Hanson
andRiseman
point out akey
issue
in
the
development
of a computervisionsystem:
A
central problemin image
understanding
is
the
representation and appropriateuse of all available sources of
knowledge
during
the
interpretation
process.Each
of
the
many different kinds
ofknowledge
that
may be
relevant at various pointsduring
interpretation imposes different kinds
of constraints onthe
underlying
representation.In general,
the representations mustbe
sensitive enoughto
ca