Visual navigation and servoing for object manipulation with mobile robots

(1)

manipulation with mobile robots

DISSERTATION

submitted in partialfulllment

of the requirements forthe degree

Doktor Ingenieur

(Dotor of Engineering)

inthe

Faulty of EletrialEngineering and InformationTehnology

atTU Dortmund University

by

Dipl.-Ing. Thomas Nierobish

Shwäbish Gmünd, Germany

Date of submission: 21th January2014

First examiner: Univ.-Prof. Dr.-Ing. Prof. h.. Dr. h.. Torsten Bertram

Seond examiner: Univ.-Prof. Dr.-Ing. Bernd Tibken

(2)

frontier, it is exiting and disorganised; there is often no reliable

authority to appeal to - many useful ideas have no theoretial

grounding, and some theories are useless in pratie."

Forsyth and Pone

(3)

Inthe future, autonomousservierobots are supposed toremovethe burden of monotoni

and tedioustasks like pikup and delivery from people. Vision being the most important

human sensor and feedbak system is onsidered to play a prominent role in the future

of robotis. Robust tehniques for visual robot navigation, objet reognition and vision

assisted objet manipulation are essential in servie robotis tasks. Mobile manipulation

in servie robotis appliations requires the alignment of the end-eetor with reognized

objets of unknown pose. Image based visual servoing provides a means of model-free

manipulationof objetssolely relying on2D image information.

In this thesis ontributions to the eld of deoupled visual servoing for objet

manipula-tion as well as navigation are presented. A novel approah for large view visual servoing

of mobilerobots is presented by deoupling the gaze and navigation ontrol via a virtual

amera plane, whih enables the visual ontroller to use the same naturallandmarks

e-iently over a largerange of motion. In order toomplete the repertoire of reative visual

behaviors an innovative door passing behavior and an obstale avoidane behavior using

omnivision are designed. The developed visual behaviors represent a signiant step

to-wards the model-free visual navigation paradigm relying solely on visual pereption. A

novelapproahfor visualservoing based onaugmented image featuresis presented, whih

hasonlyfouro-diagonalouplingsbetweenthevisualmomentsandthedegreesofmotion.

As the visual servoing relies on unique image features, objet reognition and pose

align-mentof the manipulator relyonthe same representation of the objet. Inmany senarios

the features extrated in the referene pose are only pereivable aross a limited region

of the work spae. This neessitates the introdutionof additional intermediate referene

views of the objet and requires path planning in view spae. In this thesis a model-free

approah for optimal large view visual servoing by swithing between referene views in

order tominimizethe time toonvergene is presented.

The eieny and robustness of the proposed visual ontrolshemes are evaluated inthe

virtualrealityandontherealmobileplatformaswellasontwodierentmanipulators. The

experimentsareperformedsuessfullyindierentsenariosinrealistioeenvironments

withoutanyprior struturing. Therefore thisthesis presentsamajor ontributiontowards

(4)

Autonome Servieroboter sollen in Zukunft dem Menshen monotone und körperlih

an-strengendeAufgaben abnehmen,indemsiebeispielsweiseHol-undBringedienste ausüben.

Visuelle Wahrnehmung ist das wihtigste menshlihe Sinnesorgan und

Rükkopplungs-systemundwirddahereineherausragendeRolleinzukünftigenRobotikanwendungen

spie-len. Robuste Verfahren für bildbasierte Navigation, Objekterkennung und Manipulation

sind essentiell für Anwendungen in der Servierobotik. Die mobile Manipulation in der

Servierobotik erfordert die Ausrihtung des Endeektors zu erkannten Objekten in

un-bekannterLage. DiebildbasierteRegelungermöglihteinemodellfreieObjektmanipulation

allein durh Berüksihtigung der zweidimensionalen Bildinformationen.

ImRahmendieserArbeitwerdenBeiträgezurentkoppeltenbildbasiertenRegelungsowohl

fürdieObjektmanipulationalsauhfürdieNavigationpräsentiert. EinneuartigerAnsatz

für die bildbasierte Weitbereihsregelung mobiler Roboter wird vorgestellt. Hierbei

wer-dendieBlikrihtungs-undNavigationsregelungdurheinevirtuelleKameraebene

entkop-pelt, was es der bildbasierten Regelung ermögliht, dieselben natürlihen Landmarken

ef-zientübereinenweitenBewegungsbereihzuverwenden. UmdasRepertoiredervisuellen

Verhalten zu vervollständigen, werden ein innovatives Türdurhfahrtsverhalten sowie ein

HindernisvermeidungsverhaltenbasierendaufomnidirektionalerWahrnehmungentwikelt.

DieentworfenenvisuellenVerhaltenstelleneinenwihtigenShrittinRihtungdes

Paradig-mas derreinenmodellfreienvisuellenNavigationdar. Einneuartiger Ansatzbasierend auf

BildmerkmalenmiteinererweitertenAnzahlvonAttributenwirdvorgestellt,dernaheiner

Entkopplung der Eingangsgröÿen nur vier unerwünshte Kopplungen zwishen den

Bild-momenten und den Bewegungsfreiheitsgraden aufweist. In vielen Anwendungsszenarien

sinddieextrahiertenReferenzmerkmalenurineinembegrenztenBereihdesArbeitsraums

sihtbar. Dieserfordert dieEinführungzusätzliher Zwishenansihten des Objektessowie

eine Pfadplanung im zweidimensionalen Bildraum. In dieser Arbeit wird deswegen eine

modellfreieMethodikfürdiezeitoptimalebildbasierteWeitbereihsregelungpräsentiert,in

der zwishen den einzelnen Referenzansihten umgeshaltet wird, um die Konvergenzzeit

zu minimieren.

DieEzienzundRobustheitdervorgeshlagenenbildbasiertenReglerwerdensowohlinder

virtuellenRealität alsauh auf der realenmobilen Plattformsowie zweiuntershiedlihen

Manipulatorenveriziert. DieExperimentewerdeninuntershiedlihenSzenarienin

alltäg-lihen Büroumgebungen ohne vorherige Strukturierung durhgeführt. Diese Arbeit stellt

einen wihtigenShritt hin zu visuellerWahrnehmung alseinzigerund universeller Sensor

(5)

1 Introdution 1

1.1 Mobile manipulation . . . 2

1.2 Relatedwork . . . 3

1.3 Objetive of this thesis . . . 9

2 State of the art of omputer vision and visual servoing 11 2.1 Perspetive amera,multiple-viewgeometry and omnivision . . . 11

2.2 Robustpointfeature detetion for reognition . . . 14

2.3 Visualnavigation . . . 18

2.4 Image-based visualservoing . . . 21

2.5 Experimental systems for visualservoing, navigation and loalization . . . 27

3 From vision guided to visual navigation of mobile robots 29 3.1 Vision-guidednavigation . . . 30

3.1.1 Planning . . . 30

3.1.2 Topologial loalization. . . 31

3.2 Visualbehavior fordoor passing . . . 34

3.3 Visualbehaviors forollision-freenavigation . . . 36

3.3.1 Corridorentering . . . 36

(6)

4 Global visual homing by visual servoing 43

4.1 Generalonept . . . 44

4.2 Virtual ameraplane . . . 46

4.3 Cameragaze ontrol . . . 49

4.4 Visualnavigation ontrol . . . 51

4.4.1 Controlby image Jaobian . . . 51

4.4.2 Controlwith imagemomentsand primitivevisual behaviors . . . . 53

4.4.3 Controlwith homography . . . 56

4.4.4 Experimentalresults . . . 56

4.5 Comparisonof vision guidedand visual navigation. . . 60

5 Loal visual servoing with generi image moments 63 5.1 Augmented pointfeatures . . . 64

5.2 Generimoments . . . 66

5.2.1 Moments for rotation . . . 66

5.2.2 Moments for translation . . . 68

5.2.3 Coupling analysis of the sensitivity matrix . . . 73

5.3 Positioningin 4DOF with augmented point features . . . 74

5.3.1 Controlleroptimization . . . 74

5.3.2 Simulationand experimentalresults . . . 78

5.4 Positioningin simulationsin 6DOF with augmented point features . . . . 79

5.5 Alternative: Visualservoing on avirtual amera plane . . . 80

(7)

6.1 Stability analysis dependingon feature distribution . . . 88

6.2 Optimalreferene imageseletion . . . 91

6.2.1 Controlriteria . . . 91

6.3 Navigation inthe imagespae . . . 94

6.4 Experimental results . . . 97

6.4.1 Navigation aross a spherewithin the virtual reality . . . 99

6.4.2 Navigation aross a semi ylinder with a5 DOFmanipulator . . . . 99

6.4.3 Navigation aross a uboid with a6 DOFmanipulator . . . 101

6.5 Alternative: Model-free pose estimation withloalvisualservoing . . . 103

6.6 Evaluationand onlusion . . . 109

7 Conlusions and future work 111

A Analysis of the grid-based time to ontat from optial ow 115

B Analysis of the sensitivity matrix 119

Bibliography 123

(8)

The abbreviations used within the sope of this work are ordered alphabetially in the

following.

ARIA AdvanedRobotInterfae for Appliations

ARNL AdvanedRobotis Navigationand Loalizationsystem

a.u. arbitrary units

AUTOSAR AUTomotiveOpen SystemARhiteture

BRIEF Binary Robust Independent Elementary Features

CAD Computer-Aided Design

CMAES Controlled Model-AssistedEvolution Strategy

CV Current View

DBRVS Distane-Based Referene View Seletion

DOF Degree Of Freedom

DoG Dierene of Gaussian

EKF Extended KalmanFilter

FAST Features fromAelerated SegmentTest

FCRVS Fixed Convergene RefereneView Seletion

FSI Fixed Sale Interpolation

GFTT GoodFeatures ToTrak

GF-HOG GradientField-Histogramof Oriented Gradients

GLOH GradientLoationand OrientationHistogram

GV Goal View

HIL Hardware In the Loop

HOG Histogramof Oriented Gradients

IBVS Image-Based Visual Servoing

IR InfraRed

LQR Linear Quadrati Regulator

MAES Model-AssistedEvolution Strategy

NN Neural Network

ORB Oriented FAST and RotatedBRIEF

ORVS Optimal Referene View Seletion

(9)

PD ProportionalDierential

PTZ Pan TiltZoom

RANSAC RANdom SAmpleConsensus algorithm

RMSE RootMean Square Error

ROS RobotOperating System

RV Referene View

SIFT Sale InvariantFeature Transformation

SII Sale InvariantInterpolation

SLAM Simultaneous LoalizationAnd Mapping

SNN Single NearestNeighbor

SURF Speeded Up Robust Features

ToF Time of Flight

tt time to ontat

VSLAM Visual SimultaneousLoalizationAndMapping

(10)

In the present work vetors and matries are printed in bold type. Vetors are hereby

displayed by minusule letters whereas matries are represented by apital letters, and

salars are expressed in itali style. The nomenlature is sorted as following: the rst

lassiationriterionislatin beforegreek letters,afterwards lower-asebeforeupper-ase

letters, and nallyboldbeforeitalitype.

a

ontrolation (for appearane based visualservoing)

a

h

saling fator (for homography)

a

i

, b

i

distaneofaninterestpointtoitsappropriateepipolarlineorresponding to the

u

- and

v

-diretion, respetively

a

k

pixel displaement

a

m

, b

m

, c

m

, d

m

modelparameters for exponentialfuntion

A Hesse matrix

α

rotation aroundthe

x

-axis (roll)

α

a

orretion fator forthe adaptive imageJaobian

α

c

,

α

˙

c

amera pan angle, respetively veloity

α

ia

, β

ia

, γ

ia

interior angles

α

u

, α

v

intrinsi amera parameter: saling fator depending on

λ

and pixel di-mensions

b

C

ref

image features inthe refereneframe

β

rotation aroundthe

y

-axis(pith)

β

c

,

β

˙

c

amera tiltangle, respetivelyveloity

c

performane riterion

conf

avg

mean of the ondene values

conf

seg(

i,j

)

ondene values ina windowwiththe rowand olumnposition

(i, j)

of the ell

C, C

n

, C

r

absolute, normalizedand relativenumberoffeature orrespondenes be-tween the referene viewand the urrent image

C

ref

, C

α,β

, C

R

stati and rotated amera oordinate systems, respetively, and amera oordinate system inthe imageplane

C

V

virtual amera oordinatesystem, respetively virtual ameraplane CVi

i

-th refereneview

(11)

d

kp

normalized keypoint desriptor of SIFT features

d

distane

D

Dierene-of-Gaussian

∆

f

error between desired and atualfeature loations

∆ ˆ

f

total normalized summedfeature error

∆f

γ

orretion along

γ

of the averaged keypoint rotation

∆f

ω

,

∆

f

ω

predited motion of the image features ausedby

∆Θ

R

∆ϕ

feature error between referene and urrent distortion (amera retreat problem)

∆Θ

R

orientationaltask spae error

∆x

lateral task spae error

∆z

longitudinal taskspae error

[

e

1

a

,

e

2

a

]

T

epipoles from the atual image

[

e

1

ref

,

e

2

ref

]

T

epipoles from the desiredview

E essentialmatrix desribing the epipolaronstraint

¯

E(θ),

E(φ),

¯

E(r)

¯

mean absolute errorin azimuth, elevation and radius

E

u

, E

v

entropy along the

u

- and

v

-axis, respetively

ε

residual error between model and data point (for error funtion of the M-estimator)

ε

d

dissimilarity(residualerror)

ε

γ

estimation error for amerarotation

η

1

, η

2

tuning variables

f

urrent image features, stated depending on the ontext as

f

i

= [u

i

, v

i

]

for the

i

-thimage featurewith oordinates

u

i

, v

i

, inthe ontext ofSIFT features as

f

i

= [u

i

, v

i

, φ

i

, σ

i

]

with the additional attributes orienta-tion

φ

i

and sale

σ

i

, also in the ontext of image moments as

f

=

[f

α

, f

β

, f

γ

, f

x

, f

y

, f

z

]

f

ref

referene image features, alsoused inthe ontext of image moments

f

α

image moment for rotationaroundthe

x

-axis

f

β

image moment for rotationaroundthe

y

-axis

f

γ

image moment for rotationaroundthe optialaxis

f

x

image moment for translationalong the

x

-axis

f

y

image moment for translationalong the

y

-axis

f

z

image moment for translationalong the amera axis

f

zd

image momentfor translation alongthe ameraaxis, alternative expres-sion via the distane between pointfeatures

F

ost funtion

G Gaussian lter

γ

rotation aroundthe

z

-axis (yaw), respetively the optialamera axis

γ

t

angle between orientationof virtual amera planeand templateplane

γ

V

angle between the virtual ameraplane and the orientationof the robot

h

twie the distane between the parabola's vertex and the fous of an omnidiretionalamera

(12)

H

,H

ˆ

homography, estimated homography by feature orrespondenes

H

u

(i)

relative frequeny of features in

i

-tholumn

H

v

(i)

relative frequeny of features in

i

-throw

I urrent image, alsodenoted as

I

(u, v, t)

in dependene of the pixel oor-dinates

u, v

and time

t

I

ref

referene image

[I

u

, I

v

]

T

spatial intensity gradient in

u

- and

v

-diretion, respetively

J

visual imageJaobian

J

+

pseudoinverse of the imageJaobian

J

a

Jaobian for appearane based visualservoing

J

e

Jaobian for visualservoing on epipoles

J

vω

separated Jaobian for rotationalmotion

J

vt

separated Jaobian for translationalmotion

J

vξu

ξ

separated Jaobian for angleand axisof rotation parametrization

J

xz

separated Jaobian for translational motion, redued to two degrees of freedom

J

dk

robot Jaobianfor dierential kinematis J

f

i

image Jaobian forthe imagemomentin

i

, whereas

i

stands for

x

,

y

,

z

,

α

,

β

,

γ

J

f

i

,j

image Jaobian entry for the image moment in

i

with a movement in

j

, whereas both

i

and

j

stand for

x

,

y

,

z

,

α

,

β

,

γ

and

i

=

j

(desired ouplings)

˜

J

f

i

,j

image Jaobian entry for the image moment in

i

with a movement in

j

, whereas both

i

and

j

stand for

x

,

y

,

z

,

α

,

β

,

γ

and

i

6=

j

(undesired ouplings)

J

ω

separated Jaobian for rotationalmotion,redued toone degree of free-dom

k

onstant proportionalgain

k

a

adaptive gain

k

proportionalgain fator

K amera alibration matrix as afuntion of the intrinsi amera

parame-ters

l

k

image displaement

L Gaussian-blurred image

λ

foallength

λ

e

evaluated individualsof

λ

-CMAES

λ

eig

eigenvalue

λ

i

Lagrange multiplier

λ

p

ospring of

λ

-CMAES

µ

ontrolparameter for Levenberg-Marquardt optimization

µ

(

i,j

)

meanofthetimetoontatvaluesinasegmentwiththerowandolumn position

(i, j)

of the ell

(13)

n

normal vetor of aplane

n

,

n

min

,

n

max

numberof feature orrespondenes, respetively minimum/maximum

∇

pw

divergene for eah pairingwindow

ω

rotationalveloity

ω

R

,

ω

R

max

rotationalveloity of non-holonomirobot,rotationalveloity limit

Ω

spatialneighborhoodaroundimagefeature,respetivelypointofinterest p

i

world point

p

i

pointin imageplane

p

v

pointin virtual ameraplane

π

(s, a)

optimal poliy(for appearane based visualservoing)

φ

anonial orientation of the keypoint

ϕ, ϕ

ref

urrent andrefereneanglebetween twopointsforminga linerelativeto the horizontalline

q

robot joint angles

˙

q

robot joint veloities

Q ation value funtion (for appearane based visualservoing)

r

amera position

˙

r

amera veloity

r

f

horizontaldistane from fous toparabolaofanomnidiretionalamera

r

XY

Pearson's orrelation oeient desribing the linear dependeny be-tween two stohasti variables

X

and

Y

R rotation matrix

ρ

error funtion of the M-estimator

ρ, α

polaroordinates

s

objet appearane (inangularolor oourrene histograms)

σ

imagefeaturesaleespeiallyintheontextofSIFTandSURF features, also referredto asthe standard deviationof the Gaussian

σ

e

parameter to regulate outlier suppression (for error funtion of the M-estimator)

σ

u

, σ

v

varianeof the feature distribution

t translation vetor

ttc

,

ttc

avg

time to ontat, meantime to ontat

ttc

nv

one of the

m

total time to ontat estimates omputed from the orre-sponding ow vetors

T

C

α,β

C

R

transformationfromtheameraoordinatesystemtotherotatedamera

oordinate system

T

C

ref

C

α,β

transformation of the rotated amera oordinate system into the stati

amera oordinatesystem

T

C

V

C

ref

transformationfromthexed refereneframeenteredatthefoalpoint

to the virtual ameraplane

T

C

V

C

R

transformation from the amera plane to the horizontal virtual amera

(14)

T

ext

extrinsi homogeneoustransformation matrix

T

int

intrinsi homogeneous transformationmatrix

θ

az

, φ

el

, r

sc

referene azimuth, elevation and radius in spherialoordinates

ˆ

θ

az

,

φ

ˆ

el

,

r

ˆ

sc

estimated azimuth, elevation and radius inspherialoordinates

θ

icp

intrinsi amera parameter: angle between the axes of the retinal image

Θ

m

modelparameters (for error funtion of the M-estimator)

Θ

R

orientation of the robot

u

pixel oordinatein

x

-diretion of the ameraoordinate system

[u, v,

1]

T

homogeneous 2D imageoordinates

[ˆ

u,

ˆ

v,

1]

T

normalized 2D image oordinates

[¯

u,

¯

v]

T

deviation of the feature entroid fromthe origin

[ ˙

u,

v]

˙

T

optial ow

[u

0

, v

0

]

T

intrinsiameraparameter: priniplepointdesribing intersetionof op-tial axis with image plane

[u

cog

, v

cog

,

1]

T

feature entroid of urrent view

[ˆ

u

cog

,

v

ˆ

cog

,

1]

T

feature entroid of goalview

[u

V

, v

V

,

1]

T

2D image oordinates in the virtual amera plane

[u

vcog

, v

vcog

]

entroidofthe

u

-,respetively

v

-oordinateoftheurrentviewexpressed in the horizontal virtual amera plane after the feature rotation about

∆Θ

R

[ˆ

u

vcog

,

v

ˆ

vcog

]

entroid of the

u

-, respetively

v

-oordinate of the referene view ex-pressed in the horizontal virtual ameraplane after the feature rotation

about

∆Θ

R

u

ξ

axis of rotationparametrization U

Λ

V

T

singular value deomposition (SVD) of amatrix

v veloity

v

pixel oordinatein

y

-diretion of the ameraoordinate system

v

R

translational veloity of non-holonomi robot,

v

R

is omposed of

v

R

z

in longitudinal diretionand

v

R

x

inlateral diretion

v

R

Left

, v

R

Right

ommandedveloityfortheleftandrightwheeloftherobot,respetively

v

R

max

translational veloity limit

w

i

dynami weight fordeoupling

f

x

and

f

y

w

i,

norm

normalized dynami weight(to beindependent of the distane

z

)

w(u, v)

weighting funtion, e.g. for optial oworHesse matrix

x

position

[x, y, z]

and orientation

[α, β, γ]

of the end-eetor

[x, y, z,

1]

T

homogeneous point oordinates

[x

R

, z

R

, θ

R

]

T

state of non-holonomirobot

x

i

data point(for error funtion of the M-estimator)

[X, Y

]

T

,

[ ¯

X,

Y

¯

]

T

stohastivariables,mean values of stohasti variables

ξ

angle of rotationparametrization

z

f

horizontal axis ofparabolimirror

(15)

Introdution

In the future servierobots are supposed toliberate peoplefrom the burden of monotoni

and tedious tasks. Robotspereive their environmentby meansof fore, touh,proximity

or visualfeedbak with the objetive to perform omplex manipulation tasksin dynami,

unstrutured environmentsof aomplexity thatexeedsthe apabilitiesof urrentroboti

manipulatorsinindustrialsettings. Pikupanddeliverytasksonstituteanoveldomainof

appliation for intelligentservie robots. This development is triggered by more powerful

and aordablesensors, inreased omputational power and the advent of lightweight

ma-nipulators. Thisthesis isa ontributiontowards thegoalof realizingmobilemanipulation

with autonomous servie robots.

Visionbeing the most importanthuman sensor and feedbak system isonsidered toplay

aprominentrole inthe future ofrobotis. Mobile manipulationinservieroboti

applia-tions requires loalization, navigation, objet reognition as well as objet manipulation.

All these tasks are ahieved with advaned sensors suh as expensive laser sanners,

af-fordable sonar as well as amera systems. Several tasks like obstale avoidane and 3D

world modelingare easily ahieved by applying laser sensors. In order todisseminate

ser-vie robots on a broad sale, their osts have to be redued. Thus, new territory has to

be entered in order to replae laser sanners in favor of ameras as a universal sensor.

Camera systems oer the major advantage that they enable the reognition of objets as

well as people inluding their gestures and mimis, in addition to their appliability for

loalization and navigation. They provide high dimensional and noisy data requiring

in-formationproessingandreasoning inorder toompensateforthe informationomplexity

omparedtolasers. Therefore this thesisfousesonthe hallengingtasktoahievemobile

(16)

1.1 Mobile manipulation

A general omprehensive outline of mobile manipulation is given by the Tehnial

Com-mittee onMobile Manipulation:

"The ultimate goalof Autonomous Mobile Manipulation isthe exeution of omplex

manipulationtasks, in unstrutured and dynami environments, in whih ooperation with

humans may be required. To ahieve this goal, several sienti and engineering

hallenges, urrently beyond the state of the art in robotis, must be addressed." [146℄.

Mobile manipulation neessitates dierent skills suh as planning, loalization as well as

deliberative navigation and objet reognition in onjuntion with objet manipulation.

The omplexity of this mission arises from the high dimensional pereptual data aited

with unertainties as well as system omplexity that emerges from the mobile platform

itselfbut even more fromthe dynamis and ambiguitiesof the environment.

Given a senario in whih the human instruts the mobile platform with tasks suh as

table setting or pikup and delivery, the robot rst of all has to loalize itself in its

dy-nami environment as neither oes nor households are stati. Loalization is essential

for planning as well as mission supervision. After the problem "Where am I?" is solved,

navigation isrequiredinorder toaddressthe problemof"How toget fromAtoB?". The

navigation is supposed to guide the robottowards a goaldestination for example passing

a door, while simultaneously avoiding ollisions. A large variety of dierent navigation

shemes isprovided in literaturemostly usingombinationsof dierent sensors. This

the-sisfollows the paradigmof purely vision-based navigationnegleting other kindsof sensor

merely utilizing image data. Therefore all important skills for navigation of autonomous

mobile robots suh as obstale avoidane, natural landmark orientation for goal-oriented

navigation as well as door passing are designed solely based on visual pereption. The

skills for navigation using vision are supposed to be eient to implement and robust to

guarantee the safeoperationof the mobile platform.

One the designated goal loation is reahed the mobile platform needs to reognize and

handle daily objets in household environments. The objet reognition and

manipula-tion relies on the same objet representation, whih is sparse in order to fulll memory

onstraints of the underlying hardware. The task of objet manipulation onsists of the

alignmentoftheend-eetorwithreognizedobjetsofunknownpose. Image-basedvisual

servoingprovidesameansofmodel-freemanipulationofobjetssolelyrelyingon2Dimage

information. Thereforethisthesisprovidesasigniantsteptowardsmanipulationofdaily

objetsrelyingonnaturaltextureeven ifthegrasppose oftheobjetisoutsidetheurrent

view ofthe objet.

(17)

equipped with sonar sensors. Two amera systems, a monoular pan-tilt amera and an

omnidiretionalameraaremountedontheplatformforloalization,navigationandobjet

reognition. A manipulator with a two-nger gripper from Neuronis is installed on the

platform. The eye-in-hand amerais designated for losed-loopobjet manipulation. The

manipulator redues the eld of view of the omnidiretional amera. This imposes no

onstraint on the later-on desribed navigation with the omnidiretional amera beause

theremainingeldofviewofaround300

◦

stillontainsallrelevantenvironmentalontents.

gripper amerafor objet grasping manipulator omnidiretional amera pan-tiltamera sonarsensors mobileplatform

Figure1.1: Mobile robot.

1.2 Related work

The mobile platform is provided with an Advaned Robot Interfae for Appliations

(ARIA) [100℄. ARIA already inorporates ontrol of robot's veloities, odometri, sonar

andlaser measurementsaswellasollision-freenavigationdue toreative behaviorsbased

onitssonarorlaser data. In ordertoahievegoal-oriented navigationadditionalpakages

(18)

Nav-ontrolof the robot's ations,e.g. the progress of the taskin the map, are atthe disposal

of the ustomer. The ustomer has a fully operational robot with these pakages, whih

navigatesafteraninitialmappingstagewithoutollisionsinagoal-orientedmannerin

dy-namienvironments. Toahieveeven moreomplextasksintheontextofservierobotis

suh as human reognition, human-mahine interation as well as objet reognition and

manipulation additionalsensors for visualpereption are required. Whilea servie robot

inherits more tedious tasks from humans, it is indispensable to redue the overall osts

espeially for the hardware in order to nally ahieve the eonomi breakthrough in the

onsumer market. Therefore the motivation arises to design the ruial apabilities suh

asloalizationand navigationas wellas advaned skillssuhas objetmanipulationwith

a single ost-eient sensor system in onjuntion with highlyadvaned ontrol

method-ologies, rather than employingmultiplekinds of expensive sensors in parallel. This trend

fromhardwaretosoftwareintelligeneoursinmanyindustrybranheswithseverepriing

pressure e.g. automotiveindustry. Cameras represent aneient solutiontothis dilemma

beausethe rangeof possibleappliationsand skillsoverprieismuhmoreadvantageous

ompared to laser. Therefore in its rst part this thesis aims at the objetive to ahieve

similarperformanefornavigationwithvisualpereptionomparedtothealreadyexisting

ommerialsoftware withlaser sensors. This providesthebasis foradditionalappliations

suh asobjetmanipulation,whih are treatedin the seond halfof this thesis.

The robot ontrol is based on a hybrid arhiteture [15℄ depited in gure 1.2, omposed

of a planning layer, a oordination layer and a subordinate reative layer. The role of

the planning layer onsists in generating the mission plan and its surveillane, inluding

global loalization of the robot, preloaded path planning for goal-oriented navigation as

wellasobjet manipulation. The oordinationlayerativatesordeativates thosereative

behaviors that are neessary for suessful realization of the plan and adequate in the

urrent ontext. It is also responsible for the diagnosis of the robot's status, mission

surveillane and emergeny or fallbak strategies. The operation of the reative layer

followsthebehaviorbasedparadigm[18℄,asitabandonsanyabstrat representationofthe

environmentbutdeidesaboutthemotionommandsonlybasedontheurrentpereption

provided by the sensors (behavior representation). A behavior is represented by a diret

map from the stimulus, for example the distane measurement, to the response, in the

ase of mobile robots the motor ommands. In ase of navigation an obstale avoidane

behavior guarantees the safety of the robot with respet to ollisions with surrounding

objets. Other reative behaviors e.g. onstant veloity, orridor entering, homing are

primarily useful for loalnavigation. The objet manipulation requires a behavior whih

transfersthe manipulatorina pre-graspingposition. This thesisinvestigatesthe potential

of amerasystems toreplaethe sensor inputsfor theplanningand the reativelayerand

ompletelydispensewithdistanesensorssuhaslaseremployedinommeriallyavailable

robotsystems.

(19)

planninglayer

oordination layer

reative layer path planning

oordination,optimization, management

diagnosis and surveillane

ontrol, stabilization sensor sonar, laser behavior adaption behavior seletion sensor sonar, laser

behavior representation behavior oordination

atuators emergeny strategies system monitoring trajetory generation loalization navigation objetmanipulation mission planning

Figure 1.2: Hybrid three-layer model for robot ontrol with planning, oordination and

reativelayer, withlaserand sonarasinputforloalizationandnavigationintheplanning

layeras well asfor the behaviors inthe reative layer.

mobile robot navigation [39℄ distinguish between indoorand outdoor navigation. A

om-prehensive overview for visual navigation is provided by [16℄, whih ategorizes visual

navigation as map-based navigation and mapless navigation, whereas map-based

naviga-tion is subdivided into metri and topologial map-based navigation. Metri maps

repre-sent the environment inrelative oordinates with respet to anabsolute world oordinate

system, whereas topologial maps possess a graph-like struture with nodes and edges,

representing abstrat loations and the repertoire of behaviors to transit between them

without any geometri information [86℄. Loalization tehniques using laser sensors are

well-established [51℄. A frameworkfor Simultaneous LoalisationAnd Mapping (SLAM)

is provided by [147℄ by building amap fromsrathwhile ontinuously loalizingitself in

the online generated map. Eient approahes suh as FAST-SLAM [102℄ ahieve

nowa-days real-time mapping of the environment. Despite the substantial progress regarding

VSLAM (Visual SimultaneousLoalisationAndMapping) [142,36, 138℄,maps provided

by VSLAM using loal feature extration are sparse and therefore not dense enough for

metri navigation required by standard laser based navigation shemes. However, these

mapsare suitedfor robotloalization[76℄. Reentapproahes [148℄generate o-linedense

3D maps due to stereo vision with additionally integrated landmarks, nonetheless the

overall loalization is inferior to simple topologial loalization approahes using

omnivi-sion suh as [55℄. In [44℄ a VSLAM sheme provides a 3D-voxel map by FAST-SLAM in

(20)

almap-basednavigationparadigmusingpassivevisualsenors,representingenvironments

by a diretedgraph. Topologialmaps require less memoryand are suitablefor the

repre-sentation of large indoor environments. TopologialSLAM using loalfeature extration

is presented in the works of [155, 3℄, whih seems to outperform appearane-based visual

SLAM by globalfeature extration [65℄. Thehoieof the loalizationmethodology has a

diret impatonthe requiredolletionof behaviors (referred toas maplessnavigation in

[16℄). Topologialmap-based navigationrequires visualpereption representing the visual

nodes, also referred to as waypoints, as well as the visual behaviors assoiated with the

edges in order to navigate between them. Depending on the degree of integration of the

imageproessingsystems intothehybridontrolarhiteture the approahesare lassied

throughout this work into vision-guided and visual navigation shemes. Visual

naviga-tion solely uses visual information as input for the planning as well as for the reative

layer, whereas vision-guided approahes are supplemented by ativedistane sensors suh

as sonaror laser sensors providingfurther input forthe reative layer.

Visualreativebehaviorsomitmetrimaps forrepresenting the environment,insteadthey

pereive and trak objetsby ouplingthe immediate deisionabout the robot movement

diretly with the visuallyobserved appearane ofthe loalenvironment. Suh approahes

areeitherbasedonloatingspeilandmarksintheenvironment,orfollowanappearane

based approah [154℄ or measure the optial ow [4℄. The orridor entering desribed in

[4℄operatesby balaningthe optialowinthe rightand lefthemisphereof an

omnidire-tional amera system, however, it fails if texture is missing or non-uniformly distributed

inthe orridorenvironment. Vision-based navigationin unstrutured environments solely

uses natural features and strutures without adding supplementary landmarksor texture

elements to failitate the navigation task. [105℄ desribes a vision-based homing behavior

with gazeontrolfordeoupling theamera andthe robotmovement viaavirtual amera

plane. However, in this ontext the environment is strutured systematially by plaing

landmarksat seleted waypoints tosupport vision-based navigation.

Roboti manipulation of daily-life objets in unstrutured environments is an essential

requirementinservie roboti appliations. Image-Based(IBVS) and Position-Based

Vi-sualServoing(PBVS)growinvisibilityduetotheirimportaneforrobotimanipulation

and grasping. Visualservoing isdened inthe standard tutorial[70℄ as:

"the use of one or more ameras and a omputer vision system to ontrol the position of

the robot's end-eetor relative to the work piee as required by the task".

Position-based visual servoing estimates the objet's pose relative to amera,as the error

betweentheatualandthegoalposeisdenedintheCartesianspae. Themaindrawbaks

of position-basedvisualservoing are 3D modelgenerationof objets, on-lineestimationof

3D pose, system instabilitiesbeause of oarse pose estimations aswell as objets leaving

(21)

for grasping is demonstrated to the robot during a learning stage and a set of referene

featuresisextratedfromtheimage. Ageometriobjetmodeloranexpliitreonstrution

of the objet sene beomes obsolete for image-based visual servoing. Due to these two

major advantages this approahispartiularypromisingfor mobilemanipulation,namely

model-free and easy todemonstrate forthe instrutor.

The ategorization of [24℄ and [25℄ for dierent image-based visual servoing onepts is

pursued and dierent approahes in literature are ranked regarding their appliability to

mobilemanipulation. Jaobianbasedvisualservoinginvertstheanalytialrelationbetween

dierentialhanges intask spae todierentialhanges of pixeloordinates to reduethe

error inthe image spaebetween the atual and desiredfeature oordinates [151℄. Hybrid

visual servoing denes the error between atual and desired pose partially in image and

Cartesian spae [26℄. Partitioned visual servo,respetively visualservoing withdeoupled

image moments, denes image moments whih are related approximately in a one-to-one

relationship totheir degrees of motion,resultingin a simplelinear ontrolproblem inthe

image spae [143℄. Appearane based visualservoing [37℄ aptures the overall appearane

of an objet rather than single features and relates this appearane by an oine learned

interation matrix to ontrolvalues to steer the end-eetor inthe referene pose. Other

approahes for visual servoing suh as visual servoing on epipoles [120℄ or by strutured

lightare negleted beause of their minorimportane forservie robotis.

Figure 1.3 depits a radar hart in order to ompare dierent visual servoing onepts

withrespettovariousaspets. Visualservoingby imageJaobian,hybridvisualservoing,

visualservoingbydeoupledmomentsaswellasappearane basedvisualservoingare

om-pared regarding stability, alibration issues, onvergene, ompliane with servie roboti

speiationsand biologyinspiration. Stabilityisdivided intoglobalasymptotiand loal

asymptotistabilityas wellasheuristiapproahes for stabilityanalysis e.g. onvex

poly-gons. Hybridvisualservoinghas thehighestrankingdue toitsglobalasymptotistability.

Appearane basedvisualservoinghasthelowestrankingasthe stabilityanalysisofthe

op-timal poliy(feed-forward) isnot analytiallyfeasible. On the ontraryappearane based

approahes require in priniple no intrinsi or extrinsi amera alibration and therefore

ahievethehighestrankinginthisategory. Nonethelessevenifthethreeotherapproahes

require intrinsi amera alibration, this is nowadays no severe limitationbeause of the

standard tools for ameraalibration [136℄. The aspet of onvergene ontains

omputa-tional omplexity as wellas the onvergene (behavior) of the image error,the taskspae

error in addition to the required atuating variables. Hybrid and visual servoing with

deoupled moments exhibit fast onvergene in onjuntion with lowomputational

om-plexity. The omputationalomplexity of oursehighlydepends onthe feature extration

methodology and its appliation parameters. On the ontrary appearane based visual

servoing has high omputational demands for extrating appearane, whereas Jaobian

based approahes partiallyshowslowonvergene dependingonthe relativepose between

(22)

regardingolusion,unstrutured lutteredenvironmentswithhighlystruturedobjetsas

wellashanginglightonditions. Additionallyobjetreognitionaswellasvisualservoing

should rely on the same objet representation in order to redue memory requirements.

Appearane based visual servoing requires aurate objet segmentation to disriminate

dierentobjet poses, whih is diult to ahieve intextured environments. Nonetheless

this methodology diretly fullls the requirement for the same objet representation for

reognition and positioning. Feature based approahes in literature are presented most

frequently using simple feature primitives suh as [135℄. These features are very eient

toimplementbut not realistiforservie roboti appliationsbeause oftheir low

perep-tibility aross large regionsof the workspae aswellas their minor abilityto disriminate

amongdierentobjets. Thepotentialoffeaturebasedapproahesismuhmorepromising

than appearane based visual servoing onerning robustness due to feature redundany

and under the assumption of solved orrespondene problem. Even if appearane based

approahes are ranked highest in the ategory biology inspiration, these approahes are

suboptimalregarding the otherategories and are thereforenot pursued in the ontextof

this thesis. It isan interesting pointthat approahes adopted from nature are less robust

than purely tehnial motivated methodologiesregarding mobilemanipulation.

Conlusively it an be stated that visual servoing with deoupled moments and hybrid

visualservoing are best suited forservie roboti appliationsand are furtherinvestigated

to ahieve full appliability for mobile objet manipulation. Furthermore this thesis

pos-tulates visual servoing with deoupled moments, as no partial pose estimation requiring

intrinsi amera alibration as well as geometri assumptions of the sene are required.

Exploitation of the potential of visualservoing with deoupled image momentsregarding

deoupling the translational and rotational degrees of freedom as well as fulllingservie

roboti speiations is ahallenging task. The authors in[117℄, however, state that:

"Finding a set of visual features whih produes a deoupled interation matrix for any

amera pose seems an unreahable issue".

Nonethelessadiagonalinterationmatrixismuhdesiredandthereforeinvestigated inthe

ontext of this thesis with the suess of nding a resulting interation matrix with only

four remainingouplings independent of the amera pose.

In many senarios the features extrated inthe referene pose are only pereivable aross

a limited region of the work spae. Dierent terminologies are reported in literature for

visualservoingarossseveralintermediaterefereneviewsoftheobjetinordertonavigate

towards the nal referenepose. Path planninginimage spae[97℄, visualservoing due to

visual memory [123℄ as well large view visualservoing [105℄ are oneptualized for global

visualservoing. Notie that loalvisualservoing is dened by the visual servoing towards

a single referene image, whereas global visual servoing is onerned with the navigation

(23)

imageJaobian hybrid deoupled moments appearane stability proof global loal heuristi alibration none intrinsi extrinsi onvergene slow medium fast

servie robotis speiations inappliable partially fullled biology inspired human-like partially tehnial

Figure1.3: Charateristis ofdierentvisualservoingonepts regardingstability,

onver-gene, servie roboti speiations and biology inspiration.

the desired pose by swithing between referene views is the ultimate goal of the ited

approahes. Global visual servoing is a hallenging task, whih is imperative to ahieve

mobile manipulationindependent of the objet's initialview inthe amera image.

1.3 Objetive of this thesis

This thesis providesa ontributiontowards mobilemanipulationin unstrutured

environ-ments with the ambitious goal to aomplish all skills and tasks exlusively by means of

visualpereption. Inordertoahievemobilemanipulationsolelyrelying onvisual

perep-tion this work yields new insights in two major domains namely visual navigation in the

(24)

•

How to ahieve time-optimal visual homing for mobile robots dealing with natural texture in dynami environments with amera systems with limited eld of view

requiringgaze and positionontrol inparallel?

•

How to design ollision-free navigation using omnivision onsidering noisy image measurements and sparsely textured oeenvironments?

•

How to aomplish door detetion, door traking and door passing in a oherent purely vision-based frameworkwith losed-loopdoortraversing?

•

How to design visual navigation in unstrutured oe environments with math-able performane inomparison tostate-of-the-art approahes using sonarand laser

sensors?

Visualservoing forobjetmanipulationismainlyonerned withthe followinghallenges:

•

How toahieve markerless and deoupled visualservoing for optimalonvergene in task spae inthe ontext of objet manipulationof dailyobjets?

•

How to realize time-optimal visual positioning of the gripper relative to an objet even ifthedesiredgraspingpositionisoutsidetheurrenteldofviewoftheamera?

•

Whihstrategyisbetter? Alook-then-movestrategyinonjuntionwithloalvisual servoing lose to the referene pose or visual servoing over several referene images

inthe ontext of servie robot appliations?

This thesis is organized as follows: Chapter 2 provides the state of the art of omputer

visionaswellasthe visualservoing inordertokeepthis thesisself-ontained. Thehapter

3 is dediated to the progress from vision-guided navigation with laser based stimuli to

purely vision-based navigation by relying solely on visual stimuli. Global visual homing

based on visual servoing with an omnidiretional in onjuntion with a pan-tilt amera

is introdued in hapter 4. A omparison of vision-guided and visual navigation is

addi-tionally provided at the end of hapter 4. In order to aomplish mobile manipulation

hapter 5 demonstrates a novel approah for markerless and deoupled visual servoing to

align the robotend-eetor with reognized objets of unknown pose. Conventional point

features areaugmentedby additionalattributes likesaleand orientation,whihestablish

a one-to-one orrespondene between the individual image moment and its

orrespond-ing degrees of freedom. The limited visibility of features neessitates the introdution of

additional intermediate referene views of the objet and requires path planning in view

spae. Therefore a new methodology for global (large view) visual servoing is introdued

in hapter 6. The path planning in the image spae is exible as the deoupled visual

servoing reliesonadynamiset offeature orrespondenesratherthan astatiset of

indi-vidualfeatures. Thispropertyallowsthe onlineseletionofoptimalrefereneviews during

servoing to the goal view resulting in time-optimal ontrol. Finally this thesis onludes

with a summary and outlook on future work in hapter 7, in whih the major

(25)

State of the art of omputer vision and

visual servoing

This hapterprovides the basis for omputer vision and visual servoing, the required

ter-minologyfortheomprehension ofthis thesisaswellasthe lassiationofthis thesisinto

the sienti ontext. This hapter is organized as follows: Image formation is desribed

insetion2.1forperspetiveandmultipleamerasaswellasforomnivision. Image

under-standing by robust feature detetion for objet reognition is treated in setion 2.2. The

two major topisvisualnavigationand imagebased visualservoingare desribed in detail

in setions2.3and 2.4, respetively, aswell asthe experimentalsystems insetion 2.5.

2.1 Perspetive amera, multiple-view geometry and

om-nivision

The general perspetive projetion model desribes the relation between a homogeneous

point p

c

(x, y, z,

1)

in the 3D amera spae oordinate system and its projetion onto the 2D imageoordinatesysteminhomogeneousoordinates p

(u, v,

1)

,whereas

λ

denotes the foallength:





u

v

1





=

1

z





λ

0 0 0

0

λ

0 0

0 0 1 0











x

y

z

1







.

(2.1)

The image point p

(u, v,

1)

on the retinal image is transformed to the normalized image plane aording to equation 2.2. This transformation yields the normalized pixel

oor-dinates

[ˆ

u,

v,

ˆ

1]

T

(26)

omparison of imagesoriginatingfrom dierent amerasystems:





ˆ

u

ˆ

v

1





=

K

−

1





u

v

1





with

K

=





α

u

−α

u

cot(θ

icp

)

u

0

α

v

sin(

θ

icp)

v

0

1





.

(2.2)

Theintrinsiameraparameters

α

u

and

α

v

desribethesalingfatorsdependingon

λ

and thepixeldimensions. Theintersetionoftheoptialaxiswiththeimageplaneisdesribed

by the priniplepoint

[u

0

, v

0

]

T

. Due to manufaturing imperfetionsof anatual amera,

the angle

θ

icp

between the axesofthe retinalimagemaynotbeequalto

90

◦

. The extrinsi

ameraparameters onsider the positionand orientation ofthe ameraoordinatesystem

relative to the world oordinate system. To express this relation, the rotation matrix R

and the translation vetor t are ombinedin a homogeneoustransformationmatrix

T

ext

:

[u, v,

1]

T

=

1

z

T

int

T

ext

[x, y, z,

1]

T

with

T

int

= (

K0

).

(2.3) Intrinsi amera parameters as well as radial distortions of the pixel oordinates

u

and

v

aused by lens imperfetions are determined by a amera alibration proess [136℄. The

radial distortion is orreted by a polynomial funtion of the squared distane between

the optial enter of the image and the given pixel oordinates (f. hapter 3.3 in [50℄).

Detailed informationabout the omplete amera system layout and the image formation

proess an be found in [67℄, whereas standard referenes [50℄, [78℄ mainly fous on the

image analysis fromlowlevel tohigh levelvision.

Multiple view geometry is onerned with partial or full 3D reonstrution, respetively,

of the environment based on multiple views of a sene. The essential and fundamental

matries desribe the epipolaronstraint for alibrated and unalibrated amera systems

whihrelatesapointinoneimagetoalineintheotherindependentofthesene'sgeometry

[90℄. The essential matrix is statedas:

E

= [

T

x

]

R

,

(2.4)

wherethe vetortisexpressed asaskew-symmetrimatrixT

x

sothatt

×

x

= [

T

x

]

x

. The essential matrix degenerates for small translations, rendering it unsuitable for automati

ontrolengineeringtopissuhasvisualservoingorimage-basedosillationmeasurements.

The homography H, however, desribes apoint-to-pointtransformation between two

per-spetive views of a plane:

a

h

[ˆ

u

2

,

v

ˆ

2

,

1]

T

=

H

[ˆ

u

1

,

ˆ

v

1

,

1]

T

with

H

=

R

+

n

T

d

t

,

(2.5)

whereas R and t are dened by the rotation and translation between the optial amera

enters. n isthe normalvetor of the planeand

d

the distane between the optialenter oftherst ameraand theplane. Contrarytotheessentialmatrixthe homographymatrix

(27)

The homographyis estimated fromatleast fourorrespondingfeatures loated ona

om-mon plane, assumingthat the saling fator

ˆ

h

33

= 1

, via:

p

2

= ˆ

Hp

1

⇔





ˆ

u

2

ˆ

v

2

1





=





ˆ

h

11

h

ˆ

12

ˆ

h

13

ˆ

h

21

h

ˆ

22

ˆ

h

23

ˆ

h

31

h

ˆ

32

ˆ

h

33









ˆ

u

1

ˆ

v

1





,

(2.6)

where H

ˆ

is,apart from asaling fator

a

h

, idential tothe atual homography matrix H. The estimated homography H

ˆ

is deomposed via singular-value deomposition into the unknowns rotationmatrix, saled diretionvetor aswell asthe normalvetor [47℄:

ˆ

H

=

U

Λ

V

T

⇔

Λ

=

U

T

ˆ

HV

⇔

Λ

=

U

T

(d

R

+

tn

T

)

V

.

(2.7)

As the deompositionof the homography yields ambiguoussolutions, the orret solution

isobtainedby takingintoaountonlythephysiallyplausiblesolutionsand asubsequent

omparison ofthe estimated with the assumed normalvetor. Multipleviewgeometry for

partialorompleterealworld reonstrutione.g. homographyistreatedextensivelyinthe

works of [60℄ and [134℄.

Conventional monoular ameras have a limited eld of view. In order to overome this

onstraint, omnidiretionalameras, alsoreferred to as atadioptriameras, onsist of a

ombination of lenses (refrative, i.e. dioptri) and mirrors (reetive, i.e. atoptri) to

enlarge the eld of view. The most important design objetive for atadioptri sensors is

toahieveasingleeetiveviewpoint,whihallowsthereonstrutionofperspetiveviews

andpanoramiimageswith arbitraryorientations. A detailedoverviewof singleviewpoint

atadioptri sensorsand the imageformation proess is provided by [8, 53℄.

a) b) ) spherimirror vertex paraboli mirror fous

p

c2

p

c1

p

1

p

2

Figure2.1: a)Omnidiretionalamera;b)Geometryofaparaboliomnidiretionalamera;

) Omnidiretionalimage.

The omnidiretional sensor used in this thesis onsists of a amera DFK-31AF03 from

(28)

azimuthandapproximately60

◦

inelevation. Figure2.1depitstheomnidiretionalamera

(a), a shemati view of the projetion geometry (b) as well as an omniview () referred

tointhe following asomnivision. Theatadioptrisensor onsistsofa parabolimirrorin

onjuntion with a spheri mirror and a perspetive lens system. Paraboli mirrors have

anorthographiprojetion,whihguarantees thatthe lightraysfromthe environmentare

reeted parallel towards the spheri mirror. The spheri mirror also satises the single

viewpoint onstraint, whereas the enter of projetion lies in the enter of the sphere. A

sharpsingleviewpointimageisobtainedastheenter ofthesphereoinideswiththefoal

pointof theperspetive lenssystem. Figure2.1b)shows thegeometryof suh aparaboli

omnidiretional amera. The world points p

1

and p

2

are orthographially reeted to

the points p

1

and p

2

inthe imageplane. Thevertex of the parabolahas the distane

h/2

to the fous whih is the single viewpoint of the parabola. The parameter

h

is also the radius

r

f

at

z

p

= 0

. Thus, the expression for the reeting surfae follows as:

z

p

=

h

2

−

r

2

f

2h

.

(2.8)

In gure 2.1 ) the omniview is presented whih shows the blind spot in the enter, an

analogy to the human eye, originating from a pin in the enter of the spheri mirror to

prevent multiple reetions.

Omnivisioniswellsuitedformobilerobotappliationsasitapturestheentiresurrounding,

whihfailitates robot loalizationas wellas robot navigation. Furthermore,due totheir

largeeld ofview, omnidiretionalamerasystems areoptimalforworkspae surveillane

of produt assistants [141℄.

2.2 Robust point feature detetion for reognition

For developing vision-based ontrolonepts for mobilemanipulation in unstrutured

en-vironments unambiguous and reognizable features have tobeextrated from the amera

images. Contrary to the industrial ontext where markers or labels are imprinted on

ob-jets and in the surrounding environments, for servie roboti tasks this approah is not

feasible. Thus, the algorithms employed in this thesis have to reognize the features in

the amera image if the amera-objet distane hanges (saling invariane), the lighting

onditionsvary,theamerarotatesarounditsoptialaxisorissubjettoane

transforma-tions. Assoiatingthesamefeatureindierentperspetivesisreferredtoasorrespondene

problem.

In the following, two prominent and useful algorithms from literature for loal feature

extration and for solving the orrespondene problem are presented in detail. Primarily

(29)

extration and mathing. Based onthis eient implementation, a sophistiated method

forfeature extration,Sale InvariantFeature Transformation(SIFT),isdesribed whih

is utilizedwithin the sopeof this work.

GFTT onsists of an edge detetion in order to loalize interest points and subsequently

trak the same feature over onseutive images. Strong ornersin the image are deteted

with the Hesse matrix aording tothe ideas of the Harrisedge detetor [59℄:

A

=

X

u

X

v

w(u, v)

I

2

u

I

u

I

v

I

u

I

v

I

v

2

,

(2.9)

with the image derivatives

I

u

and

I

v

in

u

-and

v

-diretion, respetively, and the isotropi weighting

w(u, v)

suh as a Gaussian kernel. The two eigenvalues

λ

eig

1

and

λ

eig

2

are ex-tratedfromA. If

λ

eig

1

,λ

eig

2

arelosetozerothenthe imageregionishomogeneous. Ifone of the two eigenvalues is muh greater than the other the image region ontains anedge.

A orner is deteted only if both eigenvalues have large positive values and satisfy the

onstraintmin(

λ

eig

1

,λ

eig

2

)larger than athreshold. The ornerrepresents aninterest point whihistrakedinonseutiveimagesbyasmallwindow

Ω

s

assumingpurelytranslational

motion. Inordertoavoidfalsetrakingoffeaturesthedissimilarityismeasured foralarge

window

Ω

l as follows:

ε

d

=

Z Z

Ωl

[

I

2

(

Rp

r

+

t

)

−

I

1

]

2

d

p

r

.

(2.10)

If the residual error

ε

d

exeeds a ertain threshold the feature is lassied as lost and is therefore rejeted. GFTT are well suited for loal feature traking and are therefore not

suited for advaned servie roboti appliations. Of ourse sale-invariane an also be

ahieved by asale-independentHarrisedgedetetionusingaGaussianpyramid,

nonethe-less the feature extration desribed in the following has a better representation of the

features suited for reognition even under large displaement and rotations as well as

hanges in lightingonditions.

Sale Invariant Feature Transformation introdued by Lowe [93℄ is an approah to detet

and extrat loal features from an image with similar methodology as GFTT but with

superior performane in terms of reognition, beause of ombinations of the progress in

image proessingsine the rst presentation of GFTT. They demonstrate invarianewith

respet to sale, orientation and illumination. SIFT features are onveniently mathed

aross similar views of the same sene. The utilizationof spei markers in vision-based

appliations beomes obsolete as the environment and textured objets naturally ontain

suitable SIFT features. SIFT features are distinguishableas their assoiatedkeypoint

de-sriptorinludesaompat, albeitspeirepresentationof thesurroundingimageregion.

These properties make them partiularlysuitable for vision-based loalization, visual

ser-voing,objetreognitionandposeestimation. Astheirpropertiesareessentialforthelater

onintroduedvisual ontrollers, the four major omputationstages are briey desribed.

(30)

dierentsales. The saleof the SIFTfeature isdened by

σ

. Thedierene ofGaussians isalulated fromthedierene ofonvoluted imagesatneighboringsales

σ

, respetively

kσ

. Given aGaussian-blurred image L

L

(u, v, σ) =

G

(u, v, σ)

∗

I

(u, v) where

G(u

i

, v

i

, σ

i

) =

1

2πσ

2

i

exp

−(u

2

i

+

v

i

2

)

σ

2

i

(2.11)

is avariablesale Gaussian, Idenotes the image tobeproessed and

∗

is the onvolution operator. The onvolution of animage with aDoG lter is dened by

D

(u, v, σ) = (

G

(u, v, kσ)

−

G

(u, v, σ))

∗

I

(u, v) =

L

(u, v, kσ)

−

L

(u, v, σ).

(2.12)

The onverted images are grouped by otaves whih orrespond to doublingthe value of

σ

,resulting ina pyramid of DoG imageswith dierent sale.

(2)Keypointloalization: Theinterestpointsintheimagearereferredtoaskeypoints.

They are identied eitherby their loalmaxima orminima of the DoG images aross the

sales. Every pixel inthe DoG imageis heked for itsandidate validity by omparing it

with its eight neighbors at the same sale and alsowith its nine orresponding neighbors

at neighboring sales. If the pixel exhibits either a loal maximum or loal minimum

it is seleted as a andidate keypoint. Every andidate keypoint needs interpolation to

aurately determine its position. Keypoints with low ontrast values are removed and

responses along the edges are also eliminated. One the positions of the keypoints are

assignedtheir orientation an be determined.

(3)Orientationassignment: Orientationofthekeypointisdeterminedusingagradient

orientation histogram in the neighborhood of the keypoint. The ontribution of eah

neighboring pixel is weighted by the gradient magnitude and a Gaussian window with a

width

σ

that is 1.5 times the sale of the keypoint. Peaks in the histogram orrespond to dominant orientations. A separate keypoint is reated for the diretion orresponding

to the histogram maximum and any other diretion within 80% of the maximum value.

The properties of the keypoints are all desribed relative to the keypoint orientation to

aomplish orientation invariane.

(4) Keypoint desriptor: Withthe informationabout the keypoint orientation, a

key-point desriptoris onstrutedwhih isaset oforientationhistograms onthe neighboring

4by4 pixels. The histogramsare expressed with respet tothe keypointorientation. The

histogram has eight bins and eah desriptor has an array of four histograms around its

keypoint. Eah SIFT feature onsists of anormalized keypointdesriptor

d

kp

with 4by 4 by

8 = 128

elements.

Mathing of SIFT features: Mathing of SIFT features involves the determinationof

orresponding features in two views of the same sene. Therefore the SIFT features are

(31)

length

128

. In order to make the mathing even more robust the relative rather than the absolutesimilarityis evaluatedusing the relationshipbetween the highestand the seond

highestvalue of similaritywhihis required toexeed a speied threshold.

ThepresentedontroloneptsanberealizedidentiallywithSURF(SpeededUpRobust

Features)[13,12℄beausetheyalsoontributeadditionalattributesassaleandorientation

of the features. Other methods for loal feature extration suh as GLOH (Gradient

Loation and OrientationHistogram) [99℄, HOG(Histogram of Oriented Gradients)[34℄

or its signiant extension GF-HOG (Gradient Field-Histogram of Oriented Gradients)

[68℄onlydierinthemethodologytoapturetheloalappearaneofthefeaturedesriptor.

[127℄ reently introduedORB (OrientedFAST and RotatedBRIEF),whih ombines in

an eient way the keypoint detetor FAST [125℄ with the eient feature desriptor

BRIEF [21℄. FAST extrats keypointseven faster than GFTTorSIFT. However, asthese

methods do not oer any major improvement apart from faster omputational time e.g.

based ondisretization by integral imageslike SURF, they are not onsidered further.

Literature reports two distint approahes to solve the pose estimation problem. Model

based methods rely on the extration of spei geometri features in the image suh as

orners and edges. Robust features likeSIFT, GFTT orSURF are mandatoryfor

model-basedobjetreognitionandposeestimation. Clustersofrobustimagefeaturesareutilized

in the rst step to reognize the objet. Afterwards the extrated features are ompared

andrelatedtoaknowngeometrimodeloftheobjet. Eientandreliableapproahes for

modelbasedposeestimationwithknown orrespondeneshavebeenproposedby[38, 115℄.

The drawbak of thesemethodslike any othermodelbased approahes isthe requirement

of an a-priori geometri model of the objet, an exat amera alibration as well as the

solution of the orrespondene problem, whih beomes inherently more diult in ase

of olusionand ambiguous features. Following the model based paradigm, [56℄ therefore

desribesanapproahfortheonstrutionof3Dmetrimodelsfrommultipleimagestaken

with anunalibrated handheld amerafor augmented reality appliations.

In ontrast, globalappearane based methodsapture the overall visualappearane of an

objet, e.g. the multidimensionalreeptiveelds introdued by [132℄. Neitherdo they

de-pendontheextrationofindividualfeaturesnor dothey faethe orrespondeneproblem.

Thebasiideaistoapturetheappearanebystatistialrepresentationssuhashistograms

inordertoalulateaprobabilityoftheobjet'spreseneintheurrentimageview, anidea

whih is inherent toalmost every appearane based approah. The methodology onsists

roughly of three steps, primarily low-dimensional loal feature desriptors are alulated

on a regular grid on the image, these desriptors are then quantized and aggregated in

multi-dimensionalhistogramsand nallyomparedto storedhistogramsof known objets

exploiting the Bayes rule. The major dierene between objet reognition by lusters of

SIFT features and by means of multidimensional reeptive elds an be summarized as

(32)

as-desriptor an bedetermined,therebyexploitingall imageinformationavailable.

Multidi-mensionalreeptiveeldsontheontraryalulatealow-dimensionalfeaturedesriptoron

a regular grid, thereby giving away information in textured highly distinguishable image

regions and additionally sampling homogeneousregions with less informationfor the

his-togramsaswell. [22℄proposedistaneoloroourrenehistogramsforobjetreognition

of multi-olored, textured objets, emphasizing the onservation of geometri information

as the major advantage of olor oourrene histograms ompared to regular olor

his-tograms. Based on this fundamental idea, [43℄ propose olor oourrene histograms for

objet reognitionaswellas1 DOFpose estimation. Theangularextensionof olor

oo-urrenehistogramsissuggestedby[106℄inthe ontextofpose estimationofrobotplayers

(AIBOs) as well as for 2 DOF pose estimation of multi-olored, textured objets [107℄.

[104℄ introdue amethod thatombines appearane and geometriobjetmodels in order

toahieverobustand fastobjetdetetion aswellas2DOFposeestimation. Theirmajor

ontribution is the integration of the known 3D geometry of the objet during mathing

and pose estimation by a statistialanalysis of the distribution of feature appearanes in

the view spae. Nonetheless their approah requires a 3D model of the objet, whih is

diulttogenerate forobjetsofomplex shape andthereforethe inherent problemof all

modelbased approahes.

Image-based visual servoing presented in setion 2.4 provides the means for model-free

objet manipulationforservie robotappliationswithoutprior pose estimation requiring

only anobjet reognition with e.g. lustersof GFTT orSIFT features and asubsequent

ontrolintheimage spaetowardsthe desiredloationsofthefeaturesinthe imageplane.

This approah leads to a high position auray, but nonetheless ahieves only loal

on-vergenedue toviewpointlimitations. Therefore aninitialpose estimationisagainmostly

mandatoryasthe urrentobjetviewdoesnotneessarilyontain thefeatureslosetothe

manipulationposition. Globalvisualservoing introduedinhapter6overomesthe above

statedlimitations,thereby onstitutingapromisingandmoreeientapproahompared

to model and appearane based objet reognition and pose estimation, negleting any

modelknowledge but stillinorporatingthe high positionauray.

2.3 Visual navigation

The haraterization of the dierent visual navigation onepts leads to the appraisal of

topologial map-based navigation with reative visual behaviors as stated in setion 1.3.

Visualnavigationdraws itsinspirationfrombiology whihprovidesnumerous examplesof

visualbehaviorsofinsetsandbirds. Itishallengingtodesignbehaviorsthatarenotbased

on distane sensors but on visual stimuli onsidering the burden of high omputational

omplexity and noisy data. The authors in [1℄ extrat the elements of early vision by

(33)

denition of only fourfundamental visualprimitives, namely olor, texture, disparity and

optial owto be utilizedfor designing visualbehaviors.

Color orresponds to the dierent wavelengths in the visible range of the light spetrum.

It requires model knowledge about the surrounding world e.g. the olor information of

objetslikedoorsand sidewalls. Additionallythe problemofolor onstanyisnot solved

yet, assigning always the same olor to a homogeneous monohromati area in spite of

dierent illuminating onditions as desribed by the Dihromati Reetion Model [82℄.

Therefore olor is not suitedfor the navigation inunstrutured environments. "Texture is

a phenomenon that is widespread, easy to reognise and hard to dene" [50℄. Texture is

understood by two similar but distint meanings.

(1) Textureisdenedasrepeatedpatternslikearpet, hairorgrasswhihhaveaspei

response in the frequeny domain, thereby extratable and distinguishableby lter

ban