manipulation with mobile robots
DISSERTATION
submitted in partialfulllment
of the requirements forthe degree
Doktor Ingenieur
(Dotor of Engineering)
inthe
Faulty of EletrialEngineering and InformationTehnology
atTU Dortmund University
by
Dipl.-Ing. Thomas Nierobish
Shwäbish Gmünd, Germany
Date of submission: 21th January2014
First examiner: Univ.-Prof. Dr.-Ing. Prof. h.. Dr. h.. Torsten Bertram
Seond examiner: Univ.-Prof. Dr.-Ing. Bernd Tibken
frontier, it is exiting and disorganised; there is often no reliable
authority to appeal to - many useful ideas have no theoretial
grounding, and some theories are useless in pratie."
Forsyth and Pone
Inthe future, autonomousservierobots are supposed toremovethe burden of monotoni
and tedioustasks like pikup and delivery from people. Vision being the most important
human sensor and feedbak system is onsidered to play a prominent role in the future
of robotis. Robust tehniques for visual robot navigation, objet reognition and vision
assisted objet manipulation are essential in servie robotis tasks. Mobile manipulation
in servie robotis appliations requires the alignment of the end-eetor with reognized
objets of unknown pose. Image based visual servoing provides a means of model-free
manipulationof objetssolely relying on2D image information.
In this thesis ontributions to the eld of deoupled visual servoing for objet
manipula-tion as well as navigation are presented. A novel approah for large view visual servoing
of mobilerobots is presented by deoupling the gaze and navigation ontrol via a virtual
amera plane, whih enables the visual ontroller to use the same naturallandmarks
e-iently over a largerange of motion. In order toomplete the repertoire of reative visual
behaviors an innovative door passing behavior and an obstale avoidane behavior using
omnivision are designed. The developed visual behaviors represent a signiant step
to-wards the model-free visual navigation paradigm relying solely on visual pereption. A
novelapproahfor visualservoing based onaugmented image featuresis presented, whih
hasonlyfouro-diagonalouplingsbetweenthevisualmomentsandthedegreesofmotion.
As the visual servoing relies on unique image features, objet reognition and pose
align-mentof the manipulator relyonthe same representation of the objet. Inmany senarios
the features extrated in the referene pose are only pereivable aross a limited region
of the work spae. This neessitates the introdutionof additional intermediate referene
views of the objet and requires path planning in view spae. In this thesis a model-free
approah for optimal large view visual servoing by swithing between referene views in
order tominimizethe time toonvergene is presented.
The eieny and robustness of the proposed visual ontrolshemes are evaluated inthe
virtualrealityandontherealmobileplatformaswellasontwodierentmanipulators. The
experimentsareperformedsuessfullyindierentsenariosinrealistioeenvironments
withoutanyprior struturing. Therefore thisthesis presentsamajor ontributiontowards
Autonome Servieroboter sollen in Zukunft dem Menshen monotone und körperlih
an-strengendeAufgaben abnehmen,indemsiebeispielsweiseHol-undBringedienste ausüben.
Visuelle Wahrnehmung ist das wihtigste menshlihe Sinnesorgan und
Rükkopplungs-systemundwirddahereineherausragendeRolleinzukünftigenRobotikanwendungen
spie-len. Robuste Verfahren für bildbasierte Navigation, Objekterkennung und Manipulation
sind essentiell für Anwendungen in der Servierobotik. Die mobile Manipulation in der
Servierobotik erfordert die Ausrihtung des Endeektors zu erkannten Objekten in
un-bekannterLage. DiebildbasierteRegelungermöglihteinemodellfreieObjektmanipulation
allein durh Berüksihtigung der zweidimensionalen Bildinformationen.
ImRahmendieserArbeitwerdenBeiträgezurentkoppeltenbildbasiertenRegelungsowohl
fürdieObjektmanipulationalsauhfürdieNavigationpräsentiert. EinneuartigerAnsatz
für die bildbasierte Weitbereihsregelung mobiler Roboter wird vorgestellt. Hierbei
wer-dendieBlikrihtungs-undNavigationsregelungdurheinevirtuelleKameraebene
entkop-pelt, was es der bildbasierten Regelung ermögliht, dieselben natürlihen Landmarken
ef-zientübereinenweitenBewegungsbereihzuverwenden. UmdasRepertoiredervisuellen
Verhalten zu vervollständigen, werden ein innovatives Türdurhfahrtsverhalten sowie ein
HindernisvermeidungsverhaltenbasierendaufomnidirektionalerWahrnehmungentwikelt.
DieentworfenenvisuellenVerhaltenstelleneinenwihtigenShrittinRihtungdes
Paradig-mas derreinenmodellfreienvisuellenNavigationdar. Einneuartiger Ansatzbasierend auf
BildmerkmalenmiteinererweitertenAnzahlvonAttributenwirdvorgestellt,dernaheiner
Entkopplung der Eingangsgröÿen nur vier unerwünshte Kopplungen zwishen den
Bild-momenten und den Bewegungsfreiheitsgraden aufweist. In vielen Anwendungsszenarien
sinddieextrahiertenReferenzmerkmalenurineinembegrenztenBereihdesArbeitsraums
sihtbar. Dieserfordert dieEinführungzusätzliher Zwishenansihten des Objektessowie
eine Pfadplanung im zweidimensionalen Bildraum. In dieser Arbeit wird deswegen eine
modellfreieMethodikfürdiezeitoptimalebildbasierteWeitbereihsregelungpräsentiert,in
der zwishen den einzelnen Referenzansihten umgeshaltet wird, um die Konvergenzzeit
zu minimieren.
DieEzienzundRobustheitdervorgeshlagenenbildbasiertenReglerwerdensowohlinder
virtuellenRealität alsauh auf der realenmobilen Plattformsowie zweiuntershiedlihen
Manipulatorenveriziert. DieExperimentewerdeninuntershiedlihenSzenarienin
alltäg-lihen Büroumgebungen ohne vorherige Strukturierung durhgeführt. Diese Arbeit stellt
einen wihtigenShritt hin zu visuellerWahrnehmung alseinzigerund universeller Sensor
1 Introdution 1
1.1 Mobile manipulation . . . 2
1.2 Relatedwork . . . 3
1.3 Objetive of this thesis . . . 9
2 State of the art of omputer vision and visual servoing 11 2.1 Perspetive amera,multiple-viewgeometry and omnivision . . . 11
2.2 Robustpointfeature detetion for reognition . . . 14
2.3 Visualnavigation . . . 18
2.4 Image-based visualservoing . . . 21
2.5 Experimental systems for visualservoing, navigation and loalization . . . 27
3 From vision guided to visual navigation of mobile robots 29 3.1 Vision-guidednavigation . . . 30
3.1.1 Planning . . . 30
3.1.2 Topologial loalization. . . 31
3.2 Visualbehavior fordoor passing . . . 34
3.3 Visualbehaviors forollision-freenavigation . . . 36
3.3.1 Corridorentering . . . 36
4 Global visual homing by visual servoing 43
4.1 Generalonept . . . 44
4.2 Virtual ameraplane . . . 46
4.3 Cameragaze ontrol . . . 49
4.4 Visualnavigation ontrol . . . 51
4.4.1 Controlby image Jaobian . . . 51
4.4.2 Controlwith imagemomentsand primitivevisual behaviors . . . . 53
4.4.3 Controlwith homography . . . 56
4.4.4 Experimentalresults . . . 56
4.5 Comparisonof vision guidedand visual navigation. . . 60
5 Loal visual servoing with generi image moments 63 5.1 Augmented pointfeatures . . . 64
5.2 Generimoments . . . 66
5.2.1 Moments for rotation . . . 66
5.2.2 Moments for translation . . . 68
5.2.3 Coupling analysis of the sensitivity matrix . . . 73
5.3 Positioningin 4DOF with augmented point features . . . 74
5.3.1 Controlleroptimization . . . 74
5.3.2 Simulationand experimentalresults . . . 78
5.4 Positioningin simulationsin 6DOF with augmented point features . . . . 79
5.5 Alternative: Visualservoing on avirtual amera plane . . . 80
6.1 Stability analysis dependingon feature distribution . . . 88
6.2 Optimalreferene imageseletion . . . 91
6.2.1 Controlriteria . . . 91
6.3 Navigation inthe imagespae . . . 94
6.4 Experimental results . . . 97
6.4.1 Navigation aross a spherewithin the virtual reality . . . 99
6.4.2 Navigation aross a semi ylinder with a5 DOFmanipulator . . . . 99
6.4.3 Navigation aross a uboid with a6 DOFmanipulator . . . 101
6.5 Alternative: Model-free pose estimation withloalvisualservoing . . . 103
6.6 Evaluationand onlusion . . . 109
7 Conlusions and future work 111
A Analysis of the grid-based time to ontat from optial ow 115
B Analysis of the sensitivity matrix 119
Bibliography 123
The abbreviations used within the sope of this work are ordered alphabetially in the
following.
ARIA AdvanedRobotInterfae for Appliations
ARNL AdvanedRobotis Navigationand Loalizationsystem
a.u. arbitrary units
AUTOSAR AUTomotiveOpen SystemARhiteture
BRIEF Binary Robust Independent Elementary Features
CAD Computer-Aided Design
CMAES Controlled Model-AssistedEvolution Strategy
CV Current View
DBRVS Distane-Based Referene View Seletion
DOF Degree Of Freedom
DoG Dierene of Gaussian
EKF Extended KalmanFilter
FAST Features fromAelerated SegmentTest
FCRVS Fixed Convergene RefereneView Seletion
FSI Fixed Sale Interpolation
GFTT GoodFeatures ToTrak
GF-HOG GradientField-Histogramof Oriented Gradients
GLOH GradientLoationand OrientationHistogram
GV Goal View
HIL Hardware In the Loop
HOG Histogramof Oriented Gradients
IBVS Image-Based Visual Servoing
IR InfraRed
LQR Linear Quadrati Regulator
MAES Model-AssistedEvolution Strategy
NN Neural Network
ORB Oriented FAST and RotatedBRIEF
ORVS Optimal Referene View Seletion
PD ProportionalDierential
PTZ Pan TiltZoom
RANSAC RANdom SAmpleConsensus algorithm
RMSE RootMean Square Error
ROS RobotOperating System
RV Referene View
SIFT Sale InvariantFeature Transformation
SII Sale InvariantInterpolation
SLAM Simultaneous LoalizationAnd Mapping
SNN Single NearestNeighbor
SURF Speeded Up Robust Features
ToF Time of Flight
tt time to ontat
VSLAM Visual SimultaneousLoalizationAndMapping
In the present work vetors and matries are printed in bold type. Vetors are hereby
displayed by minusule letters whereas matries are represented by apital letters, and
salars are expressed in itali style. The nomenlature is sorted as following: the rst
lassiationriterionislatin beforegreek letters,afterwards lower-asebeforeupper-ase
letters, and nallyboldbeforeitalitype.
a
ontrolation (for appearane based visualservoing)a
h
saling fator (for homography)a
i
, b
i
distaneofaninterestpointtoitsappropriateepipolarlineorresponding to theu
- andv
-diretion, respetivelya
k
pixel displaementa
m
, b
m
, c
m
, d
m
modelparameters for exponentialfuntionA Hesse matrix
α
rotation aroundthex
-axis (roll)α
a
orretion fator forthe adaptive imageJaobianα
c
,
α
˙
c
amera pan angle, respetively veloityα
ia
, β
ia
, γ
ia
interior anglesα
u
, α
v
intrinsi amera parameter: saling fator depending onλ
and pixel di-mensionsb
C
ref
image features inthe refereneframeβ
rotation aroundthey
-axis(pith)β
c
,
β
˙
c
amera tiltangle, respetivelyveloityc
performane riterionconf
avg
mean of the ondene valuesconf
seg(
i,j
)
ondene values ina windowwiththe rowand olumnposition(i, j)
of the ellC, C
n
, C
r
absolute, normalizedand relativenumberoffeature orrespondenes be-tween the referene viewand the urrent imageC
ref
, C
α,β
, C
R
stati and rotated amera oordinate systems, respetively, and amera oordinate system inthe imageplaneC
V
virtual amera oordinatesystem, respetively virtual ameraplane CVii
-th refereneviewd
kp
normalized keypoint desriptor of SIFT featuresd
distaneD
Dierene-of-Gaussian∆
f
error between desired and atualfeature loations∆ ˆ
f
total normalized summedfeature error∆f
γ
orretion alongγ
of the averaged keypoint rotation∆f
ω
,
∆
f
ω
predited motion of the image features ausedby∆Θ
R
∆ϕ
feature error between referene and urrent distortion (amera retreat problem)∆Θ
R
orientationaltask spae error∆x
lateral task spae error∆z
longitudinal taskspae error[
e
1
a
,
e
2
a
]
T
epipoles from the atual image
[
e
1
ref
,
e
2
ref
]
T
epipoles from the desiredviewE essentialmatrix desribing the epipolaronstraint
¯
E(θ),
E(φ),
¯
E(r)
¯
mean absolute errorin azimuth, elevation and radiusE
u
, E
v
entropy along theu
- andv
-axis, respetivelyε
residual error between model and data point (for error funtion of the M-estimator)ε
d
dissimilarity(residualerror)ε
γ
estimation error for amerarotationη
1
, η
2
tuning variablesf
urrent image features, stated depending on the ontext asf
i
= [u
i
, v
i
]
for thei
-thimage featurewith oordinatesu
i
, v
i
, inthe ontext ofSIFT features asf
i
= [u
i
, v
i
, φ
i
, σ
i
]
with the additional attributes orienta-tionφ
i
and saleσ
i
, also in the ontext of image moments asf
=
[f
α
, f
β
, f
γ
, f
x
, f
y
, f
z
]
f
ref
referene image features, alsoused inthe ontext of image momentsf
α
image moment for rotationaroundthex
-axisf
β
image moment for rotationaroundthey
-axisf
γ
image moment for rotationaroundthe optialaxisf
x
image moment for translationalong thex
-axisf
y
image moment for translationalong they
-axisf
z
image moment for translationalong the amera axisf
zd
image momentfor translation alongthe ameraaxis, alternative expres-sion via the distane between pointfeaturesF
ost funtionG Gaussian lter
γ
rotation aroundthez
-axis (yaw), respetively the optialamera axisγ
t
angle between orientationof virtual amera planeand templateplaneγ
V
angle between the virtual ameraplane and the orientationof the roboth
twie the distane between the parabola's vertex and the fous of an omnidiretionalameraH
,Hˆ
homography, estimated homography by feature orrespondenesH
u
(i)
relative frequeny of features ini
-tholumnH
v
(i)
relative frequeny of features ini
-throwI urrent image, alsodenoted as
I
(u, v, t)
in dependene of the pixel oor-dinatesu, v
and timet
I
ref
referene image[I
u
, I
v
]
T
spatial intensity gradient inu
- andv
-diretion, respetivelyJ
visual imageJaobianJ
+
pseudoinverse of the imageJaobianJ
a
Jaobian for appearane based visualservoingJ
e
Jaobian for visualservoing on epipolesJ
vω
separated Jaobian for rotationalmotionJ
vt
separated Jaobian for translationalmotionJ
vξu
ξ
separated Jaobian for angleand axisof rotation parametrizationJ
xz
separated Jaobian for translational motion, redued to two degrees of freedomJ
dk
robot Jaobianfor dierential kinematis Jf
i
image Jaobian forthe imagemomentini
, whereasi
stands forx
,y
,z
,α
,β
,γ
J
f
i
,j
image Jaobian entry for the image moment ini
with a movement inj
, whereas bothi
andj
stand forx
,y
,z
,α
,β
,γ
andi
=
j
(desired ouplings)˜
J
f
i
,j
image Jaobian entry for the image moment ini
with a movement inj
, whereas bothi
andj
stand forx
,y
,z
,α
,β
,γ
andi
6=
j
(undesired ouplings)J
ω
separated Jaobian for rotationalmotion,redued toone degree of free-domk
onstant proportionalgaink
a
adaptive gaink
proportionalgain fatorK amera alibration matrix as afuntion of the intrinsi amera
parame-ters
l
k
image displaementL Gaussian-blurred image
λ
foallengthλ
e
evaluated individualsofλ
-CMAESλ
eig
eigenvalueλ
i
Lagrange multiplierλ
p
ospring ofλ
-CMAESµ
ontrolparameter for Levenberg-Marquardt optimizationµ
(
i,j
)
meanofthetimetoontatvaluesinasegmentwiththerowandolumn position(i, j)
of the elln
normal vetor of aplanen
,n
min
,n
max
numberof feature orrespondenes, respetively minimum/maximum∇
pw
divergene for eah pairingwindowω
rotationalveloityω
R
,ω
R
max
rotationalveloity of non-holonomirobot,rotationalveloity limitΩ
spatialneighborhoodaroundimagefeature,respetivelypointofinterest pi
world point
p
i
pointin imageplane
p
v
pointin virtual ameraplane
π
(s, a)
optimal poliy(for appearane based visualservoing)φ
anonial orientation of the keypointϕ, ϕ
ref
urrent andrefereneanglebetween twopointsforminga linerelativeto the horizontallineq
robot joint angles˙
q
robot joint veloitiesQ ation value funtion (for appearane based visualservoing)
r
amera position˙
r
amera veloityr
f
horizontaldistane from fous toparabolaofanomnidiretionalamerar
XY
Pearson's orrelation oeient desribing the linear dependeny be-tween two stohasti variablesX
andY
R rotation matrix
ρ
error funtion of the M-estimatorρ, α
polaroordinatess
objet appearane (inangularolor oourrene histograms)σ
imagefeaturesaleespeiallyintheontextofSIFTandSURF features, also referredto asthe standard deviationof the Gaussianσ
e
parameter to regulate outlier suppression (for error funtion of the M-estimator)σ
u
, σ
v
varianeof the feature distributiont translation vetor
ttc
,ttc
avg
time to ontat, meantime to ontatttc
nv
one of them
total time to ontat estimates omputed from the orre-sponding ow vetorsT
C
α,β
C
R
transformationfromtheameraoordinatesystemtotherotatedamera
oordinate system
T
C
ref
C
α,β
transformation of the rotated amera oordinate system into the stati
amera oordinatesystem
T
C
V
C
ref
transformationfromthexed refereneframeenteredatthefoalpoint
to the virtual ameraplane
T
C
V
C
R
transformation from the amera plane to the horizontal virtual amera
T
ext
extrinsi homogeneoustransformation matrixT
int
intrinsi homogeneous transformationmatrixθ
az
, φ
el
, r
sc
referene azimuth, elevation and radius in spherialoordinatesˆ
θ
az
,
φ
ˆ
el
,
r
ˆ
sc
estimated azimuth, elevation and radius inspherialoordinatesθ
icp
intrinsi amera parameter: angle between the axes of the retinal imageΘ
m
modelparameters (for error funtion of the M-estimator)Θ
R
orientation of the robotu
pixel oordinateinx
-diretion of the ameraoordinate system[u, v,
1]
T
homogeneous 2D imageoordinates
[ˆ
u,
ˆ
v,
1]
T
normalized 2D image oordinates
[¯
u,
¯
v]
T
deviation of the feature entroid fromthe origin[ ˙
u,
v]
˙
T
optial ow
[u
0
, v
0
]
T
intrinsiameraparameter: priniplepointdesribing intersetionof op-tial axis with image plane[u
cog
, v
cog
,
1]
T
feature entroid of urrent view[ˆ
u
cog
,
v
ˆ
cog
,
1]
T
feature entroid of goalview[u
V
, v
V
,
1]
T
2D image oordinates in the virtual amera plane[u
vcog
, v
vcog
]
entroidoftheu
-,respetivelyv
-oordinateoftheurrentviewexpressed in the horizontal virtual amera plane after the feature rotation about∆Θ
R
[ˆ
u
vcog
,
v
ˆ
vcog
]
entroid of theu
-, respetivelyv
-oordinate of the referene view ex-pressed in the horizontal virtual ameraplane after the feature rotationabout
∆Θ
R
u
ξ
axis of rotationparametrization UΛ
VT
singular value deomposition (SVD) of amatrix
v veloity
v
pixel oordinateiny
-diretion of the ameraoordinate systemv
R
translational veloity of non-holonomi robot,v
R
is omposed ofv
R
z
in longitudinal diretionandv
R
x
inlateral diretionv
R
Left
, v
R
Right
ommandedveloityfortheleftandrightwheeloftherobot,respetivelyv
R
max
translational veloity limitw
i
dynami weight fordeouplingf
x
andf
y
w
i,
norm
normalized dynami weight(to beindependent of the distanez
)w(u, v)
weighting funtion, e.g. for optial oworHesse matrixx
position[x, y, z]
and orientation[α, β, γ]
of the end-eetor[x, y, z,
1]
T
homogeneous point oordinates
[x
R
, z
R
, θ
R
]
T
state of non-holonomirobotx
i
data point(for error funtion of the M-estimator)[X, Y
]
T
,
[ ¯
X,
Y
¯
]
T
stohastivariables,mean values of stohasti variables
ξ
angle of rotationparametrizationz
f
horizontal axis ofparabolimirrorIntrodution
In the future servierobots are supposed toliberate peoplefrom the burden of monotoni
and tedious tasks. Robotspereive their environmentby meansof fore, touh,proximity
or visualfeedbak with the objetive to perform omplex manipulation tasksin dynami,
unstrutured environmentsof aomplexity thatexeedsthe apabilitiesof urrentroboti
manipulatorsinindustrialsettings. Pikupanddeliverytasksonstituteanoveldomainof
appliation for intelligentservie robots. This development is triggered by more powerful
and aordablesensors, inreased omputational power and the advent of lightweight
ma-nipulators. Thisthesis isa ontributiontowards thegoalof realizingmobilemanipulation
with autonomous servie robots.
Visionbeing the most importanthuman sensor and feedbak system isonsidered toplay
aprominentrole inthe future ofrobotis. Mobile manipulationinservieroboti
applia-tions requires loalization, navigation, objet reognition as well as objet manipulation.
All these tasks are ahieved with advaned sensors suh as expensive laser sanners,
af-fordable sonar as well as amera systems. Several tasks like obstale avoidane and 3D
world modelingare easily ahieved by applying laser sensors. In order todisseminate
ser-vie robots on a broad sale, their osts have to be redued. Thus, new territory has to
be entered in order to replae laser sanners in favor of ameras as a universal sensor.
Camera systems oer the major advantage that they enable the reognition of objets as
well as people inluding their gestures and mimis, in addition to their appliability for
loalization and navigation. They provide high dimensional and noisy data requiring
in-formationproessingandreasoning inorder toompensateforthe informationomplexity
omparedtolasers. Therefore this thesisfousesonthe hallengingtasktoahievemobile
1.1 Mobile manipulation
A general omprehensive outline of mobile manipulation is given by the Tehnial
Com-mittee onMobile Manipulation:
"The ultimate goalof Autonomous Mobile Manipulation isthe exeution of omplex
manipulationtasks, in unstrutured and dynami environments, in whih ooperation with
humans may be required. To ahieve this goal, several sienti and engineering
hallenges, urrently beyond the state of the art in robotis, must be addressed." [146℄.
Mobile manipulation neessitates dierent skills suh as planning, loalization as well as
deliberative navigation and objet reognition in onjuntion with objet manipulation.
The omplexity of this mission arises from the high dimensional pereptual data aited
with unertainties as well as system omplexity that emerges from the mobile platform
itselfbut even more fromthe dynamis and ambiguitiesof the environment.
Given a senario in whih the human instruts the mobile platform with tasks suh as
table setting or pikup and delivery, the robot rst of all has to loalize itself in its
dy-nami environment as neither oes nor households are stati. Loalization is essential
for planning as well as mission supervision. After the problem "Where am I?" is solved,
navigation isrequiredinorder toaddressthe problemof"How toget fromAtoB?". The
navigation is supposed to guide the robottowards a goaldestination for example passing
a door, while simultaneously avoiding ollisions. A large variety of dierent navigation
shemes isprovided in literaturemostly usingombinationsof dierent sensors. This
the-sisfollows the paradigmof purely vision-based navigationnegleting other kindsof sensor
merely utilizing image data. Therefore all important skills for navigation of autonomous
mobile robots suh as obstale avoidane, natural landmark orientation for goal-oriented
navigation as well as door passing are designed solely based on visual pereption. The
skills for navigation using vision are supposed to be eient to implement and robust to
guarantee the safeoperationof the mobile platform.
One the designated goal loation is reahed the mobile platform needs to reognize and
handle daily objets in household environments. The objet reognition and
manipula-tion relies on the same objet representation, whih is sparse in order to fulll memory
onstraints of the underlying hardware. The task of objet manipulation onsists of the
alignmentoftheend-eetorwithreognizedobjetsofunknownpose. Image-basedvisual
servoingprovidesameansofmodel-freemanipulationofobjetssolelyrelyingon2Dimage
information. Thereforethisthesisprovidesasigniantsteptowardsmanipulationofdaily
objetsrelyingonnaturaltextureeven ifthegrasppose oftheobjetisoutsidetheurrent
view ofthe objet.
equipped with sonar sensors. Two amera systems, a monoular pan-tilt amera and an
omnidiretionalameraaremountedontheplatformforloalization,navigationandobjet
reognition. A manipulator with a two-nger gripper from Neuronis is installed on the
platform. The eye-in-hand amerais designated for losed-loopobjet manipulation. The
manipulator redues the eld of view of the omnidiretional amera. This imposes no
onstraint on the later-on desribed navigation with the omnidiretional amera beause
theremainingeldofviewofaround300
◦
stillontainsallrelevantenvironmentalontents.
gripper amerafor objet grasping manipulator omnidiretional amera pan-tiltamera sonarsensors mobileplatform
Figure1.1: Mobile robot.
1.2 Related work
The mobile platform is provided with an Advaned Robot Interfae for Appliations
(ARIA) [100℄. ARIA already inorporates ontrol of robot's veloities, odometri, sonar
andlaser measurementsaswellasollision-freenavigationdue toreative behaviorsbased
onitssonarorlaser data. In ordertoahievegoal-oriented navigationadditionalpakages
Nav-ontrolof the robot's ations,e.g. the progress of the taskin the map, are atthe disposal
of the ustomer. The ustomer has a fully operational robot with these pakages, whih
navigatesafteraninitialmappingstagewithoutollisionsinagoal-orientedmannerin
dy-namienvironments. Toahieveeven moreomplextasksintheontextofservierobotis
suh as human reognition, human-mahine interation as well as objet reognition and
manipulation additionalsensors for visualpereption are required. Whilea servie robot
inherits more tedious tasks from humans, it is indispensable to redue the overall osts
espeially for the hardware in order to nally ahieve the eonomi breakthrough in the
onsumer market. Therefore the motivation arises to design the ruial apabilities suh
asloalizationand navigationas wellas advaned skillssuhas objetmanipulationwith
a single ost-eient sensor system in onjuntion with highlyadvaned ontrol
method-ologies, rather than employingmultiplekinds of expensive sensors in parallel. This trend
fromhardwaretosoftwareintelligeneoursinmanyindustrybranheswithseverepriing
pressure e.g. automotiveindustry. Cameras represent aneient solutiontothis dilemma
beausethe rangeof possibleappliationsand skillsoverprieismuhmoreadvantageous
ompared to laser. Therefore in its rst part this thesis aims at the objetive to ahieve
similarperformanefornavigationwithvisualpereptionomparedtothealreadyexisting
ommerialsoftware withlaser sensors. This providesthebasis foradditionalappliations
suh asobjetmanipulation,whih are treatedin the seond halfof this thesis.
The robot ontrol is based on a hybrid arhiteture [15℄ depited in gure 1.2, omposed
of a planning layer, a oordination layer and a subordinate reative layer. The role of
the planning layer onsists in generating the mission plan and its surveillane, inluding
global loalization of the robot, preloaded path planning for goal-oriented navigation as
wellasobjet manipulation. The oordinationlayerativatesordeativates thosereative
behaviors that are neessary for suessful realization of the plan and adequate in the
urrent ontext. It is also responsible for the diagnosis of the robot's status, mission
surveillane and emergeny or fallbak strategies. The operation of the reative layer
followsthebehaviorbasedparadigm[18℄,asitabandonsanyabstrat representationofthe
environmentbutdeidesaboutthemotionommandsonlybasedontheurrentpereption
provided by the sensors (behavior representation). A behavior is represented by a diret
map from the stimulus, for example the distane measurement, to the response, in the
ase of mobile robots the motor ommands. In ase of navigation an obstale avoidane
behavior guarantees the safety of the robot with respet to ollisions with surrounding
objets. Other reative behaviors e.g. onstant veloity, orridor entering, homing are
primarily useful for loalnavigation. The objet manipulation requires a behavior whih
transfersthe manipulatorina pre-graspingposition. This thesisinvestigatesthe potential
of amerasystems toreplaethe sensor inputsfor theplanningand the reativelayerand
ompletelydispensewithdistanesensorssuhaslaseremployedinommeriallyavailable
robotsystems.
planninglayer
oordination layer
reative layer path planning
oordination,optimization, management
diagnosis and surveillane
ontrol, stabilization sensor sonar, laser behavior adaption behavior seletion sensor sonar, laser
behavior representation behavior oordination
atuators emergeny strategies system monitoring trajetory generation loalization navigation objetmanipulation mission planning
Figure 1.2: Hybrid three-layer model for robot ontrol with planning, oordination and
reativelayer, withlaserand sonarasinputforloalizationandnavigationintheplanning
layeras well asfor the behaviors inthe reative layer.
mobile robot navigation [39℄ distinguish between indoorand outdoor navigation. A
om-prehensive overview for visual navigation is provided by [16℄, whih ategorizes visual
navigation as map-based navigation and mapless navigation, whereas map-based
naviga-tion is subdivided into metri and topologial map-based navigation. Metri maps
repre-sent the environment inrelative oordinates with respet to anabsolute world oordinate
system, whereas topologial maps possess a graph-like struture with nodes and edges,
representing abstrat loations and the repertoire of behaviors to transit between them
without any geometri information [86℄. Loalization tehniques using laser sensors are
well-established [51℄. A frameworkfor Simultaneous LoalisationAnd Mapping (SLAM)
is provided by [147℄ by building amap fromsrathwhile ontinuously loalizingitself in
the online generated map. Eient approahes suh as FAST-SLAM [102℄ ahieve
nowa-days real-time mapping of the environment. Despite the substantial progress regarding
VSLAM (Visual SimultaneousLoalisationAndMapping) [142,36, 138℄,maps provided
by VSLAM using loal feature extration are sparse and therefore not dense enough for
metri navigation required by standard laser based navigation shemes. However, these
mapsare suitedfor robotloalization[76℄. Reentapproahes [148℄generate o-linedense
3D maps due to stereo vision with additionally integrated landmarks, nonetheless the
overall loalization is inferior to simple topologial loalization approahes using
omnivi-sion suh as [55℄. In [44℄ a VSLAM sheme provides a 3D-voxel map by FAST-SLAM in
almap-basednavigationparadigmusingpassivevisualsenors,representingenvironments
by a diretedgraph. Topologialmaps require less memoryand are suitablefor the
repre-sentation of large indoor environments. TopologialSLAM using loalfeature extration
is presented in the works of [155, 3℄, whih seems to outperform appearane-based visual
SLAM by globalfeature extration [65℄. Thehoieof the loalizationmethodology has a
diret impatonthe requiredolletionof behaviors (referred toas maplessnavigation in
[16℄). Topologialmap-based navigationrequires visualpereption representing the visual
nodes, also referred to as waypoints, as well as the visual behaviors assoiated with the
edges in order to navigate between them. Depending on the degree of integration of the
imageproessingsystems intothehybridontrolarhiteture the approahesare lassied
throughout this work into vision-guided and visual navigation shemes. Visual
naviga-tion solely uses visual information as input for the planning as well as for the reative
layer, whereas vision-guided approahes are supplemented by ativedistane sensors suh
as sonaror laser sensors providingfurther input forthe reative layer.
Visualreativebehaviorsomitmetrimaps forrepresenting the environment,insteadthey
pereive and trak objetsby ouplingthe immediate deisionabout the robot movement
diretly with the visuallyobserved appearane ofthe loalenvironment. Suh approahes
areeitherbasedonloatingspeilandmarksintheenvironment,orfollowanappearane
based approah [154℄ or measure the optial ow [4℄. The orridor entering desribed in
[4℄operatesby balaningthe optialowinthe rightand lefthemisphereof an
omnidire-tional amera system, however, it fails if texture is missing or non-uniformly distributed
inthe orridorenvironment. Vision-based navigationin unstrutured environments solely
uses natural features and strutures without adding supplementary landmarksor texture
elements to failitate the navigation task. [105℄ desribes a vision-based homing behavior
with gazeontrolfordeoupling theamera andthe robotmovement viaavirtual amera
plane. However, in this ontext the environment is strutured systematially by plaing
landmarksat seleted waypoints tosupport vision-based navigation.
Roboti manipulation of daily-life objets in unstrutured environments is an essential
requirementinservie roboti appliations. Image-Based(IBVS) and Position-Based
Vi-sualServoing(PBVS)growinvisibilityduetotheirimportaneforrobotimanipulation
and grasping. Visualservoing isdened inthe standard tutorial[70℄ as:
"the use of one or more ameras and a omputer vision system to ontrol the position of
the robot's end-eetor relative to the work piee as required by the task".
Position-based visual servoing estimates the objet's pose relative to amera,as the error
betweentheatualandthegoalposeisdenedintheCartesianspae. Themaindrawbaks
of position-basedvisualservoing are 3D modelgenerationof objets, on-lineestimationof
3D pose, system instabilitiesbeause of oarse pose estimations aswell as objets leaving
for grasping is demonstrated to the robot during a learning stage and a set of referene
featuresisextratedfromtheimage. Ageometriobjetmodeloranexpliitreonstrution
of the objet sene beomes obsolete for image-based visual servoing. Due to these two
major advantages this approahispartiularypromisingfor mobilemanipulation,namely
model-free and easy todemonstrate forthe instrutor.
The ategorization of [24℄ and [25℄ for dierent image-based visual servoing onepts is
pursued and dierent approahes in literature are ranked regarding their appliability to
mobilemanipulation. Jaobianbasedvisualservoinginvertstheanalytialrelationbetween
dierentialhanges intask spae todierentialhanges of pixeloordinates to reduethe
error inthe image spaebetween the atual and desiredfeature oordinates [151℄. Hybrid
visual servoing denes the error between atual and desired pose partially in image and
Cartesian spae [26℄. Partitioned visual servo,respetively visualservoing withdeoupled
image moments, denes image moments whih are related approximately in a one-to-one
relationship totheir degrees of motion,resultingin a simplelinear ontrolproblem inthe
image spae [143℄. Appearane based visualservoing [37℄ aptures the overall appearane
of an objet rather than single features and relates this appearane by an oine learned
interation matrix to ontrolvalues to steer the end-eetor inthe referene pose. Other
approahes for visual servoing suh as visual servoing on epipoles [120℄ or by strutured
lightare negleted beause of their minorimportane forservie robotis.
Figure 1.3 depits a radar hart in order to ompare dierent visual servoing onepts
withrespettovariousaspets. Visualservoingby imageJaobian,hybridvisualservoing,
visualservoingbydeoupledmomentsaswellasappearane basedvisualservoingare
om-pared regarding stability, alibration issues, onvergene, ompliane with servie roboti
speiationsand biologyinspiration. Stabilityisdivided intoglobalasymptotiand loal
asymptotistabilityas wellasheuristiapproahes for stabilityanalysis e.g. onvex
poly-gons. Hybridvisualservoinghas thehighestrankingdue toitsglobalasymptotistability.
Appearane basedvisualservoinghasthelowestrankingasthe stabilityanalysisofthe
op-timal poliy(feed-forward) isnot analytiallyfeasible. On the ontraryappearane based
approahes require in priniple no intrinsi or extrinsi amera alibration and therefore
ahievethehighestrankinginthisategory. Nonethelessevenifthethreeotherapproahes
require intrinsi amera alibration, this is nowadays no severe limitationbeause of the
standard tools for ameraalibration [136℄. The aspet of onvergene ontains
omputa-tional omplexity as wellas the onvergene (behavior) of the image error,the taskspae
error in addition to the required atuating variables. Hybrid and visual servoing with
deoupled moments exhibit fast onvergene in onjuntion with lowomputational
om-plexity. The omputationalomplexity of oursehighlydepends onthe feature extration
methodology and its appliation parameters. On the ontrary appearane based visual
servoing has high omputational demands for extrating appearane, whereas Jaobian
based approahes partiallyshowslowonvergene dependingonthe relativepose between
regardingolusion,unstrutured lutteredenvironmentswithhighlystruturedobjetsas
wellashanginglightonditions. Additionallyobjetreognitionaswellasvisualservoing
should rely on the same objet representation in order to redue memory requirements.
Appearane based visual servoing requires aurate objet segmentation to disriminate
dierentobjet poses, whih is diult to ahieve intextured environments. Nonetheless
this methodology diretly fullls the requirement for the same objet representation for
reognition and positioning. Feature based approahes in literature are presented most
frequently using simple feature primitives suh as [135℄. These features are very eient
toimplementbut not realistiforservie roboti appliationsbeause oftheir low
perep-tibility aross large regionsof the workspae aswellas their minor abilityto disriminate
amongdierentobjets. Thepotentialoffeaturebasedapproahesismuhmorepromising
than appearane based visual servoing onerning robustness due to feature redundany
and under the assumption of solved orrespondene problem. Even if appearane based
approahes are ranked highest in the ategory biology inspiration, these approahes are
suboptimalregarding the otherategories and are thereforenot pursued in the ontextof
this thesis. It isan interesting pointthat approahes adopted from nature are less robust
than purely tehnial motivated methodologiesregarding mobilemanipulation.
Conlusively it an be stated that visual servoing with deoupled moments and hybrid
visualservoing are best suited forservie roboti appliationsand are furtherinvestigated
to ahieve full appliability for mobile objet manipulation. Furthermore this thesis
pos-tulates visual servoing with deoupled moments, as no partial pose estimation requiring
intrinsi amera alibration as well as geometri assumptions of the sene are required.
Exploitation of the potential of visualservoing with deoupled image momentsregarding
deoupling the translational and rotational degrees of freedom as well as fulllingservie
roboti speiations is ahallenging task. The authors in[117℄, however, state that:
"Finding a set of visual features whih produes a deoupled interation matrix for any
amera pose seems an unreahable issue".
Nonethelessadiagonalinterationmatrixismuhdesiredandthereforeinvestigated inthe
ontext of this thesis with the suess of nding a resulting interation matrix with only
four remainingouplings independent of the amera pose.
In many senarios the features extrated inthe referene pose are only pereivable aross
a limited region of the work spae. Dierent terminologies are reported in literature for
visualservoingarossseveralintermediaterefereneviewsoftheobjetinordertonavigate
towards the nal referenepose. Path planninginimage spae[97℄, visualservoing due to
visual memory [123℄ as well large view visualservoing [105℄ are oneptualized for global
visualservoing. Notie that loalvisualservoing is dened by the visual servoing towards
a single referene image, whereas global visual servoing is onerned with the navigation
imageJaobian hybrid deoupled moments appearane stability proof global loal heuristi alibration none intrinsi extrinsi onvergene slow medium fast
servie robotis speiations inappliable partially fullled biology inspired human-like partially tehnial
Figure1.3: Charateristis ofdierentvisualservoingonepts regardingstability,
onver-gene, servie roboti speiations and biology inspiration.
the desired pose by swithing between referene views is the ultimate goal of the ited
approahes. Global visual servoing is a hallenging task, whih is imperative to ahieve
mobile manipulationindependent of the objet's initialview inthe amera image.
1.3 Objetive of this thesis
This thesis providesa ontributiontowards mobilemanipulationin unstrutured
environ-ments with the ambitious goal to aomplish all skills and tasks exlusively by means of
visualpereption. Inordertoahievemobilemanipulationsolelyrelying onvisual
perep-tion this work yields new insights in two major domains namely visual navigation in the
•
How to ahieve time-optimal visual homing for mobile robots dealing with natural texture in dynami environments with amera systems with limited eld of viewrequiringgaze and positionontrol inparallel?
•
How to design ollision-free navigation using omnivision onsidering noisy image measurements and sparsely textured oeenvironments?•
How to aomplish door detetion, door traking and door passing in a oherent purely vision-based frameworkwith losed-loopdoortraversing?•
How to design visual navigation in unstrutured oe environments with math-able performane inomparison tostate-of-the-art approahes using sonarand lasersensors?
Visualservoing forobjetmanipulationismainlyonerned withthe followinghallenges:
•
How toahieve markerless and deoupled visualservoing for optimalonvergene in task spae inthe ontext of objet manipulationof dailyobjets?•
How to realize time-optimal visual positioning of the gripper relative to an objet even ifthedesiredgraspingpositionisoutsidetheurrenteldofviewoftheamera?•
Whihstrategyisbetter? Alook-then-movestrategyinonjuntionwithloalvisual servoing lose to the referene pose or visual servoing over several referene imagesinthe ontext of servie robot appliations?
This thesis is organized as follows: Chapter 2 provides the state of the art of omputer
visionaswellasthe visualservoing inordertokeepthis thesisself-ontained. Thehapter
3 is dediated to the progress from vision-guided navigation with laser based stimuli to
purely vision-based navigation by relying solely on visual stimuli. Global visual homing
based on visual servoing with an omnidiretional in onjuntion with a pan-tilt amera
is introdued in hapter 4. A omparison of vision-guided and visual navigation is
addi-tionally provided at the end of hapter 4. In order to aomplish mobile manipulation
hapter 5 demonstrates a novel approah for markerless and deoupled visual servoing to
align the robotend-eetor with reognized objets of unknown pose. Conventional point
features areaugmentedby additionalattributes likesaleand orientation,whihestablish
a one-to-one orrespondene between the individual image moment and its
orrespond-ing degrees of freedom. The limited visibility of features neessitates the introdution of
additional intermediate referene views of the objet and requires path planning in view
spae. Therefore a new methodology for global (large view) visual servoing is introdued
in hapter 6. The path planning in the image spae is exible as the deoupled visual
servoing reliesonadynamiset offeature orrespondenesratherthan astatiset of
indi-vidualfeatures. Thispropertyallowsthe onlineseletionofoptimalrefereneviews during
servoing to the goal view resulting in time-optimal ontrol. Finally this thesis onludes
with a summary and outlook on future work in hapter 7, in whih the major
State of the art of omputer vision and
visual servoing
This hapterprovides the basis for omputer vision and visual servoing, the required
ter-minologyfortheomprehension ofthis thesisaswellasthe lassiationofthis thesisinto
the sienti ontext. This hapter is organized as follows: Image formation is desribed
insetion2.1forperspetiveandmultipleamerasaswellasforomnivision. Image
under-standing by robust feature detetion for objet reognition is treated in setion 2.2. The
two major topisvisualnavigationand imagebased visualservoingare desribed in detail
in setions2.3and 2.4, respetively, aswell asthe experimentalsystems insetion 2.5.
2.1 Perspetive amera, multiple-view geometry and
om-nivision
The general perspetive projetion model desribes the relation between a homogeneous
point p
c
(x, y, z,
1)
in the 3D amera spae oordinate system and its projetion onto the 2D imageoordinatesysteminhomogeneousoordinates p(u, v,
1)
,whereasλ
denotes the foallength:
u
v
1
=
1
z
λ
0 0 0
0
λ
0 0
0 0 1 0
x
y
z
1
.
(2.1)The image point p
(u, v,
1)
on the retinal image is transformed to the normalized image plane aording to equation 2.2. This transformation yields the normalized pixeloor-dinates
[ˆ
u,
v,
ˆ
1]
T
omparison of imagesoriginatingfrom dierent amerasystems:
ˆ
u
ˆ
v
1
=
K
−
1
u
v
1
with
K
=
α
u
−α
u
cot(θ
icp
)
u
0
0
α
v
sin(
θ
icp)
v
0
0
0
1
.
(2.2)Theintrinsiameraparameters
α
u
andα
v
desribethesalingfatorsdependingonλ
and thepixeldimensions. Theintersetionoftheoptialaxiswiththeimageplaneisdesribedby the priniplepoint
[u
0
, v
0
]
T
. Due to manufaturing imperfetionsof anatual amera,
the angle
θ
icp
between the axesofthe retinalimagemaynotbeequalto90
◦
. The extrinsi
ameraparameters onsider the positionand orientation ofthe ameraoordinatesystem
relative to the world oordinate system. To express this relation, the rotation matrix R
and the translation vetor t are ombinedin a homogeneoustransformationmatrix
T
ext
:[u, v,
1]
T
=
1
z
Tint
Text
[x, y, z,
1]
T
with
Tint
= (
K0).
(2.3) Intrinsi amera parameters as well as radial distortions of the pixel oordinatesu
andv
aused by lens imperfetions are determined by a amera alibration proess [136℄. Theradial distortion is orreted by a polynomial funtion of the squared distane between
the optial enter of the image and the given pixel oordinates (f. hapter 3.3 in [50℄).
Detailed informationabout the omplete amera system layout and the image formation
proess an be found in [67℄, whereas standard referenes [50℄, [78℄ mainly fous on the
image analysis fromlowlevel tohigh levelvision.
Multiple view geometry is onerned with partial or full 3D reonstrution, respetively,
of the environment based on multiple views of a sene. The essential and fundamental
matries desribe the epipolaronstraint for alibrated and unalibrated amera systems
whihrelatesapointinoneimagetoalineintheotherindependentofthesene'sgeometry
[90℄. The essential matrix is statedas:
E
= [
Tx
]
R,
(2.4)wherethe vetortisexpressed asaskew-symmetrimatrixT
x
sothatt×
x= [
Tx
]
x
. The essential matrix degenerates for small translations, rendering it unsuitable for automationtrolengineeringtopissuhasvisualservoingorimage-basedosillationmeasurements.
The homography H, however, desribes apoint-to-pointtransformation between two
per-spetive views of a plane:
a
h
[ˆ
u
2
,
v
ˆ
2
,
1]
T
=
H[ˆ
u
1
,
ˆ
v
1
,
1]
T
with
H=
R+
nT
d
t,
(2.5)whereas R and t are dened by the rotation and translation between the optial amera
enters. n isthe normalvetor of the planeand
d
the distane between the optialenter oftherst ameraand theplane. Contrarytotheessentialmatrixthe homographymatrixThe homographyis estimated fromatleast fourorrespondingfeatures loated ona
om-mon plane, assumingthat the saling fator
ˆ
h
33
= 1
, via:p
2
= ˆ
Hp1
⇔
ˆ
u
2
ˆ
v
2
1
=
ˆ
h
11
h
ˆ
12
ˆ
h
13
ˆ
h
21
h
ˆ
22
ˆ
h
23
ˆ
h
31
h
ˆ
32
ˆ
h
33
ˆ
u
1
ˆ
v
1
1
,
(2.6)where H
ˆ
is,apart from asaling fatora
h
, idential tothe atual homography matrix H. The estimated homography Hˆ
is deomposed via singular-value deomposition into the unknowns rotationmatrix, saled diretionvetor aswell asthe normalvetor [47℄:ˆ
H=
UΛ
VT
⇔
Λ
=
UT
ˆ
HV⇔
Λ
=
UT
(d
R+
tnT
)
V.
(2.7)As the deompositionof the homography yields ambiguoussolutions, the orret solution
isobtainedby takingintoaountonlythephysiallyplausiblesolutionsand asubsequent
omparison ofthe estimated with the assumed normalvetor. Multipleviewgeometry for
partialorompleterealworld reonstrutione.g. homographyistreatedextensivelyinthe
works of [60℄ and [134℄.
Conventional monoular ameras have a limited eld of view. In order to overome this
onstraint, omnidiretionalameras, alsoreferred to as atadioptriameras, onsist of a
ombination of lenses (refrative, i.e. dioptri) and mirrors (reetive, i.e. atoptri) to
enlarge the eld of view. The most important design objetive for atadioptri sensors is
toahieveasingleeetiveviewpoint,whihallowsthereonstrutionofperspetiveviews
andpanoramiimageswith arbitraryorientations. A detailedoverviewof singleviewpoint
atadioptri sensorsand the imageformation proess is provided by [8, 53℄.
a) b) ) spherimirror vertex paraboli mirror fous
p
c2
p
c1
p
1
p
2
Figure2.1: a)Omnidiretionalamera;b)Geometryofaparaboliomnidiretionalamera;
) Omnidiretionalimage.
The omnidiretional sensor used in this thesis onsists of a amera DFK-31AF03 from
azimuthandapproximately60
◦
inelevation. Figure2.1depitstheomnidiretionalamera
(a), a shemati view of the projetion geometry (b) as well as an omniview () referred
tointhe following asomnivision. Theatadioptrisensor onsistsofa parabolimirrorin
onjuntion with a spheri mirror and a perspetive lens system. Paraboli mirrors have
anorthographiprojetion,whihguarantees thatthe lightraysfromthe environmentare
reeted parallel towards the spheri mirror. The spheri mirror also satises the single
viewpoint onstraint, whereas the enter of projetion lies in the enter of the sphere. A
sharpsingleviewpointimageisobtainedastheenter ofthesphereoinideswiththefoal
pointof theperspetive lenssystem. Figure2.1b)shows thegeometryof suh aparaboli
omnidiretional amera. The world points p
1
and p
2
are orthographially reeted to
the points p
1
and p
2
inthe imageplane. Thevertex of the parabolahas the distane
h/2
to the fous whih is the single viewpoint of the parabola. The parameterh
is also the radiusr
f
atz
p
= 0
. Thus, the expression for the reeting surfae follows as:z
p
=
h
2
−
r
2
f
2h
.
(2.8)In gure 2.1 ) the omniview is presented whih shows the blind spot in the enter, an
analogy to the human eye, originating from a pin in the enter of the spheri mirror to
prevent multiple reetions.
Omnivisioniswellsuitedformobilerobotappliationsasitapturestheentiresurrounding,
whihfailitates robot loalizationas wellas robot navigation. Furthermore,due totheir
largeeld ofview, omnidiretionalamerasystems areoptimalforworkspae surveillane
of produt assistants [141℄.
2.2 Robust point feature detetion for reognition
For developing vision-based ontrolonepts for mobilemanipulation in unstrutured
en-vironments unambiguous and reognizable features have tobeextrated from the amera
images. Contrary to the industrial ontext where markers or labels are imprinted on
ob-jets and in the surrounding environments, for servie roboti tasks this approah is not
feasible. Thus, the algorithms employed in this thesis have to reognize the features in
the amera image if the amera-objet distane hanges (saling invariane), the lighting
onditionsvary,theamerarotatesarounditsoptialaxisorissubjettoane
transforma-tions. Assoiatingthesamefeatureindierentperspetivesisreferredtoasorrespondene
problem.
In the following, two prominent and useful algorithms from literature for loal feature
extration and for solving the orrespondene problem are presented in detail. Primarily
extration and mathing. Based onthis eient implementation, a sophistiated method
forfeature extration,Sale InvariantFeature Transformation(SIFT),isdesribed whih
is utilizedwithin the sopeof this work.
GFTT onsists of an edge detetion in order to loalize interest points and subsequently
trak the same feature over onseutive images. Strong ornersin the image are deteted
with the Hesse matrix aording tothe ideas of the Harrisedge detetor [59℄:
A
=
X
u
X
v
w(u, v)
I
2
u
I
u
I
v
I
u
I
v
I
v
2
,
(2.9)with the image derivatives
I
u
andI
v
inu
-andv
-diretion, respetively, and the isotropi weightingw(u, v)
suh as a Gaussian kernel. The two eigenvaluesλ
eig
1
andλ
eig
2
are ex-tratedfromA. Ifλ
eig
1
,λ
eig
2
arelosetozerothenthe imageregionishomogeneous. Ifone of the two eigenvalues is muh greater than the other the image region ontains anedge.A orner is deteted only if both eigenvalues have large positive values and satisfy the
onstraintmin(
λ
eig
1
,λ
eig
2
)larger than athreshold. The ornerrepresents aninterest point whihistrakedinonseutiveimagesbyasmallwindowΩ
s
assumingpurelytranslational
motion. Inordertoavoidfalsetrakingoffeaturesthedissimilarityismeasured foralarge
window
Ω
l as follows:ε
d
=
Z Z
Ωl
[
I2
(
Rpr
+
t)
−
I1
]
2
d
pr
.
(2.10)If the residual error
ε
d
exeeds a ertain threshold the feature is lassied as lost and is therefore rejeted. GFTT are well suited for loal feature traking and are therefore notsuited for advaned servie roboti appliations. Of ourse sale-invariane an also be
ahieved by asale-independentHarrisedgedetetionusingaGaussianpyramid,
nonethe-less the feature extration desribed in the following has a better representation of the
features suited for reognition even under large displaement and rotations as well as
hanges in lightingonditions.
Sale Invariant Feature Transformation introdued by Lowe [93℄ is an approah to detet
and extrat loal features from an image with similar methodology as GFTT but with
superior performane in terms of reognition, beause of ombinations of the progress in
image proessingsine the rst presentation of GFTT. They demonstrate invarianewith
respet to sale, orientation and illumination. SIFT features are onveniently mathed
aross similar views of the same sene. The utilizationof spei markers in vision-based
appliations beomes obsolete as the environment and textured objets naturally ontain
suitable SIFT features. SIFT features are distinguishableas their assoiatedkeypoint
de-sriptorinludesaompat, albeitspeirepresentationof thesurroundingimageregion.
These properties make them partiularlysuitable for vision-based loalization, visual
ser-voing,objetreognitionandposeestimation. Astheirpropertiesareessentialforthelater
onintroduedvisual ontrollers, the four major omputationstages are briey desribed.
dierentsales. The saleof the SIFTfeature isdened by
σ
. Thedierene ofGaussians isalulated fromthedierene ofonvoluted imagesatneighboringsalesσ
, respetivelykσ
. Given aGaussian-blurred image LL
(u, v, σ) =
G(u, v, σ)
∗
I(u, v) where
G(u
i
, v
i
, σ
i
) =
1
2πσ
2
i
exp
−(u
2
i
+
v
i
2
)
σ
2
i
(2.11)is avariablesale Gaussian, Idenotes the image tobeproessed and
∗
is the onvolution operator. The onvolution of animage with aDoG lter is dened byD
(u, v, σ) = (
G(u, v, kσ)
−
G(u, v, σ))
∗
I(u, v) =
L(u, v, kσ)
−
L(u, v, σ).
(2.12)The onverted images are grouped by otaves whih orrespond to doublingthe value of
σ
,resulting ina pyramid of DoG imageswith dierent sale.(2)Keypointloalization: Theinterestpointsintheimagearereferredtoaskeypoints.
They are identied eitherby their loalmaxima orminima of the DoG images aross the
sales. Every pixel inthe DoG imageis heked for itsandidate validity by omparing it
with its eight neighbors at the same sale and alsowith its nine orresponding neighbors
at neighboring sales. If the pixel exhibits either a loal maximum or loal minimum
it is seleted as a andidate keypoint. Every andidate keypoint needs interpolation to
aurately determine its position. Keypoints with low ontrast values are removed and
responses along the edges are also eliminated. One the positions of the keypoints are
assignedtheir orientation an be determined.
(3)Orientationassignment: Orientationofthekeypointisdeterminedusingagradient
orientation histogram in the neighborhood of the keypoint. The ontribution of eah
neighboring pixel is weighted by the gradient magnitude and a Gaussian window with a
width
σ
that is 1.5 times the sale of the keypoint. Peaks in the histogram orrespond to dominant orientations. A separate keypoint is reated for the diretion orrespondingto the histogram maximum and any other diretion within 80% of the maximum value.
The properties of the keypoints are all desribed relative to the keypoint orientation to
aomplish orientation invariane.
(4) Keypoint desriptor: Withthe informationabout the keypoint orientation, a
key-point desriptoris onstrutedwhih isaset oforientationhistograms onthe neighboring
4by4 pixels. The histogramsare expressed with respet tothe keypointorientation. The
histogram has eight bins and eah desriptor has an array of four histograms around its
keypoint. Eah SIFT feature onsists of anormalized keypointdesriptor
d
kp
with 4by 4 by8 = 128
elements.Mathing of SIFT features: Mathing of SIFT features involves the determinationof
orresponding features in two views of the same sene. Therefore the SIFT features are
length
128
. In order to make the mathing even more robust the relative rather than the absolutesimilarityis evaluatedusing the relationshipbetween the highestand the seondhighestvalue of similaritywhihis required toexeed a speied threshold.
ThepresentedontroloneptsanberealizedidentiallywithSURF(SpeededUpRobust
Features)[13,12℄beausetheyalsoontributeadditionalattributesassaleandorientation
of the features. Other methods for loal feature extration suh as GLOH (Gradient
Loation and OrientationHistogram) [99℄, HOG(Histogram of Oriented Gradients)[34℄
or its signiant extension GF-HOG (Gradient Field-Histogram of Oriented Gradients)
[68℄onlydierinthemethodologytoapturetheloalappearaneofthefeaturedesriptor.
[127℄ reently introduedORB (OrientedFAST and RotatedBRIEF),whih ombines in
an eient way the keypoint detetor FAST [125℄ with the eient feature desriptor
BRIEF [21℄. FAST extrats keypointseven faster than GFTTorSIFT. However, asthese
methods do not oer any major improvement apart from faster omputational time e.g.
based ondisretization by integral imageslike SURF, they are not onsidered further.
Literature reports two distint approahes to solve the pose estimation problem. Model
based methods rely on the extration of spei geometri features in the image suh as
orners and edges. Robust features likeSIFT, GFTT orSURF are mandatoryfor
model-basedobjetreognitionandposeestimation. Clustersofrobustimagefeaturesareutilized
in the rst step to reognize the objet. Afterwards the extrated features are ompared
andrelatedtoaknowngeometrimodeloftheobjet. Eientandreliableapproahes for
modelbasedposeestimationwithknown orrespondeneshavebeenproposedby[38, 115℄.
The drawbak of thesemethodslike any othermodelbased approahes isthe requirement
of an a-priori geometri model of the objet, an exat amera alibration as well as the
solution of the orrespondene problem, whih beomes inherently more diult in ase
of olusionand ambiguous features. Following the model based paradigm, [56℄ therefore
desribesanapproahfortheonstrutionof3Dmetrimodelsfrommultipleimagestaken
with anunalibrated handheld amerafor augmented reality appliations.
In ontrast, globalappearane based methodsapture the overall visualappearane of an
objet, e.g. the multidimensionalreeptiveelds introdued by [132℄. Neitherdo they
de-pendontheextrationofindividualfeaturesnor dothey faethe orrespondeneproblem.
Thebasiideaistoapturetheappearanebystatistialrepresentationssuhashistograms
inordertoalulateaprobabilityoftheobjet'spreseneintheurrentimageview, anidea
whih is inherent toalmost every appearane based approah. The methodology onsists
roughly of three steps, primarily low-dimensional loal feature desriptors are alulated
on a regular grid on the image, these desriptors are then quantized and aggregated in
multi-dimensionalhistogramsand nallyomparedto storedhistogramsof known objets
exploiting the Bayes rule. The major dierene between objet reognition by lusters of
SIFT features and by means of multidimensional reeptive elds an be summarized as
as-desriptor an bedetermined,therebyexploitingall imageinformationavailable.
Multidi-mensionalreeptiveeldsontheontraryalulatealow-dimensionalfeaturedesriptoron
a regular grid, thereby giving away information in textured highly distinguishable image
regions and additionally sampling homogeneousregions with less informationfor the
his-togramsaswell. [22℄proposedistaneoloroourrenehistogramsforobjetreognition
of multi-olored, textured objets, emphasizing the onservation of geometri information
as the major advantage of olor oourrene histograms ompared to regular olor
his-tograms. Based on this fundamental idea, [43℄ propose olor oourrene histograms for
objet reognitionaswellas1 DOFpose estimation. Theangularextensionof olor
oo-urrenehistogramsissuggestedby[106℄inthe ontextofpose estimationofrobotplayers
(AIBOs) as well as for 2 DOF pose estimation of multi-olored, textured objets [107℄.
[104℄ introdue amethod thatombines appearane and geometriobjetmodels in order
toahieverobustand fastobjetdetetion aswellas2DOFposeestimation. Theirmajor
ontribution is the integration of the known 3D geometry of the objet during mathing
and pose estimation by a statistialanalysis of the distribution of feature appearanes in
the view spae. Nonetheless their approah requires a 3D model of the objet, whih is
diulttogenerate forobjetsofomplex shape andthereforethe inherent problemof all
modelbased approahes.
Image-based visual servoing presented in setion 2.4 provides the means for model-free
objet manipulationforservie robotappliationswithoutprior pose estimation requiring
only anobjet reognition with e.g. lustersof GFTT orSIFT features and asubsequent
ontrolintheimage spaetowardsthe desiredloationsofthefeaturesinthe imageplane.
This approah leads to a high position auray, but nonetheless ahieves only loal
on-vergenedue toviewpointlimitations. Therefore aninitialpose estimationisagainmostly
mandatoryasthe urrentobjetviewdoesnotneessarilyontain thefeatureslosetothe
manipulationposition. Globalvisualservoing introduedinhapter6overomesthe above
statedlimitations,thereby onstitutingapromisingandmoreeientapproahompared
to model and appearane based objet reognition and pose estimation, negleting any
modelknowledge but stillinorporatingthe high positionauray.
2.3 Visual navigation
The haraterization of the dierent visual navigation onepts leads to the appraisal of
topologial map-based navigation with reative visual behaviors as stated in setion 1.3.
Visualnavigationdraws itsinspirationfrombiology whihprovidesnumerous examplesof
visualbehaviorsofinsetsandbirds. Itishallengingtodesignbehaviorsthatarenotbased
on distane sensors but on visual stimuli onsidering the burden of high omputational
omplexity and noisy data. The authors in [1℄ extrat the elements of early vision by
denition of only fourfundamental visualprimitives, namely olor, texture, disparity and
optial owto be utilizedfor designing visualbehaviors.
Color orresponds to the dierent wavelengths in the visible range of the light spetrum.
It requires model knowledge about the surrounding world e.g. the olor information of
objetslikedoorsand sidewalls. Additionallythe problemofolor onstanyisnot solved
yet, assigning always the same olor to a homogeneous monohromati area in spite of
dierent illuminating onditions as desribed by the Dihromati Reetion Model [82℄.
Therefore olor is not suitedfor the navigation inunstrutured environments. "Texture is
a phenomenon that is widespread, easy to reognise and hard to dene" [50℄. Texture is
understood by two similar but distint meanings.
(1) Textureisdenedasrepeatedpatternslikearpet, hairorgrasswhihhaveaspei
response in the frequeny domain, thereby extratable and distinguishableby lter
ban