Eye movements and natural tasks in an extended environment

(1)

RIT Scholar Works

Theses

Thesis/Dissertation Collections

10-3-2000

Eye movements and natural tasks in an extended

environment

Roxanne Canosa

Follow this and additional works at:

http://scholarworks.rit.edu/theses

This Thesis is brought to you for free and open access by the Thesis/Dissertation Collections at RIT Scholar Works. It has been accepted for inclusion in Theses by an authorized administrator of RIT Scholar Works. For more information, please [email protected].

Recommended Citation

(2)

in an

Extended

Environment

Roxanne Canosa

B.S. State University of New York College at Brockport (998)

A thesis submitted

in

partial

fulfillment of the

requirements for the degree of Master of Science

in

the Center for Imaging Science

in

the College of Science

Rochester Institute of Technology

October 3, 2000

Signature of Author

_

Roxanne Canosa

Accepted

by

(3)

THESIS RELEASE PERMISSION

ROCHESTER INSTITUTE OF TECHNOLOGY

COLLEGE OF SCIENCE

Eye

Movements and Natural Tasks

in

an

Extended

Environment

I, Roxanne Canosa, hereby grant pelnllssion to the Wallace Memorial Library of RIT to

reproduce my thesis

in

whole or

in

part. Any reproduction

will

not be for commerdal use or

profit.

Signature of Author

.

_

Roxanne Canosa

(4)

CHESTER F. CARLSON

CENTER FOR IMAGING SCIENCE

COLLEGE OF SCIENCE

ROCHESTER INSTITUTE OF TECHNOLOGY

ROCHESTER, NEW YORK

CERTIFICATE OF APPROVAL

M.S. DEGREE THESIS

The M.S. Degree of Roxanne Canosa

has

been

examined

and approved by the thesis committee

as satisfactory for the thesis requirement for the

Master of Science degree

in

Imaging

Science

Dr. Jeff

B.

Pelz,

The..c:;is

Advisor

Dr. Eriko Miyahara

(5)

Abstract

Eye movements can be thought of as awindow onto pre-conscious thought. Patterns of

visual

fixations

over time as well as space can reveal cognitive strategies that are not

amenabletoconsciouscontrol orverbalization. Aspatial analysis of aneyemovementtrace

usually emphasizes the role that eye movements have in _moving the retinal image of an

object of interest from the _periphery to the fovea for closer inspection. It _{is generally}

believedthat a sequence of

fixations

across a region of spacebuilds up the perception of a high-resolution

field

of view everywhere. Recentstudieshaveshownthat thisperceptionis

largely

illusory. The visual-perceptual system prefers to maintain a limited internal representation of physical objects in theworld and uses the environment as an external

source of

information,

_accessingtheinformation_onlyatthe timeit isneeded.

Thegoal ofthis researcheffortwastoinvestigatetherolethateyemovements

have

in

theperformance of_everydaytasks ina natural environment. Aseries offour experiments

were conducted that represent an attempt to _{step away from} the classical psychophysical

approach of _studying eye movements widiin the confines and contaol of the

laboratory.

There existslittle precedence forthiskind of_approach, _{partly because}past researchefforts

haveemphasized alinearsystemsmethodtorendertheanalysis_tractable,and_{partly because}

the

technology

that is required toperform theseexperiments has not existed until recently.

The hardware thatwas developed

by

the Visual Perception

Laboratory

at RIT _specifically

addresses the _portabilityconcerns thatare crucial for successfully studyingeye movements

during

naturaltasksinanon-linearextended environment.

A model was developed to describe the temporal _sequencing of eye movements in

termsofa hierarchicalstructure of goal-oriented _tasks,withindividual

fixations

considered

the lowest level of the hierarchy. The analysis gives evidence

for

the

sequencing

of eye

movements based ona

desire

tomaximize the_efficiencyoftaskperformance overtime

by

anticipating

future

activities. Thepurposeofthis _{sequencing is} toenhance interactionwith

the world under conditions of limited memory representations rather than to create the

(6)

I would like to thank and acknowledgeall thosewho helpedto make this research project

possible. ThankstoJason Babcock forthecoundesshoursspent

developing

and

fine-tuning

the _eye-tracker, and for

his

help

in _recruiting and _rurining subjects. Thanks also to

Amy

Silver

for

_{her coding} expertise andher insights into what constitutes a good

track,

and to

Diane Kucharczyk

for

her lab experience. I would also like to thank all those who

volunteeredtobesubjects for the_experiments, andthosewho tolerated thesometimes odd

intrusionsintothehallwaysandbathrooms. Thanks toJeff

Pelz,

Eriko

Miyahara,

and

Mary

Hayhoe

for

their thoughtfulinsightsandrecommendations.

Finally,

thanks to_my

family

for

(7)

Table

of

Contents

ListofFigures ix

ListofTables xviH

1.Introduction 1

1.1 Overview 1

1.2Objectives(Statementof

Work)

5

2. Background

6

2.1 Historical Perspective

6

2.2 Eye Movements 12

2.3

Visual AttentionandSelection 16

2.3.1

Saliency

Maps 16

2.3.2 Feature Integration

Theory

ofAttention 17

2.4 TheWorldasAnchor 18

2.4.1 Semantic

Consistency

18

2.4.2ChangeBlindness 18

2.4.3

Exocentric Reference Frames

19

2.4.4Position

Constancy

During

Passive Movement

19

2.5

Perceiving

theDirectionof

Heading

During

Motion 20

2.5.1 Retinalvs. Extra-retinal Information 20

2.5.2 Differential Motion Parallax

23

2.6 The Effectsof

Freeing

theHead

23

2.7 NaturalTasks 26

2.7.1

Memory

Representationof aSimple NaturalTask

-Blocks

Copying

27

2.7.2Sequential

Looking

Task

Tapping

vs.

Looking Only

29

2.7.3

Visual

Memory

inProblem

Solving

-Geometry

31

2.7.4 Eye MovementsandWorkLoad

During Driving

32

2.7.5

TheDirectionofGaze

During

Driving

34

2.7.6 Eye Movements While

Making

Tea

35

(8)

3.Approach

41

3.1

History

of

Eye-tracking

Methods

42

3.1.1 Electrical Methods

42

3.1.2 Optical Methods

43

3.2The VPL

Portable,

Wearable Eye-Tracker

45

3-2.1 The Custom Goggles Headgear

46

3.2.2 Other System Components 50

3.2.3

Theory

ofOperation 52

3.2.4 Eye-Tracker

Set-Up

andCalibration 54

3.2.5

Eye Movement

Monitoring

56

3.3

ExampleofReal-Time Data Capture 57

3.4 Data Analysis

59

3.4.1

Coding

theData

60

3.4.2 Fixation DurationsandSaccade Size

61

3.4.3

Statistical Analysis

63

4.

Experiment1

-Moving

Througha

Hallway

66

4.1

Objective

67

4.2Experimental DesignandConditions

68

4.3

Data AnalysisandResults 71

4.4

Conclusion 88

5. Experiment2

-Fixation

Stability

89

5.1 Objective

89

5.2Experimental DesignandConditions 90

5.3

Data AnalysisAnd Results

93

5.4 Conclusion

99

6.

Experiment

3

-Handwashing

100

6.

1 Objective 1 00

6.2

ExperimentalDesignandConditions 100

6.3

DataAnalysisandResults 101

6.3-1

FixationDurations 101

6.3-2

Saccade Size 104

6.3-3

MajorSub-tasks 107

6.3-4

"Look-aheads"

110

6A

Conclusion

115

7. Experiment

4

-Making

aSelection Froma

Vending

Machine 116

7.1 Objective 116

7.2 Experimental DesignandConditions ₁₁₇

7.3

Data AnalysisandResults

119

7.3.1 Fixation Durations

119

7.3.2 Saccade Size 121

7.3.3

Comparisonof

Handwashing

and

Vending

Machine Experiments

123

(9)

7.3.5

"Look-aheads"

127

7.4 Conclusion 138

8. Conclusionand

Recommendations

139

8.1 The EffectofaReal Environment

139

8.2 Eye Movements Extended Over Time 141

8.3

Applications

for

Artificial Systems 142

8.4 Recommendations

for

FutureWork 147

Appendix 148

(10)

List

of

Figures

Figure2-1 Humanshavea perceptualbiastoward_seeingthe triangleasa whole 7

Figure 2-2 Scanpaths aretasksdependent 10

Figure

2-3

Thedistributionofrods and conesis_{unevenly distributed}acrosstheretina...

13

Figure2-4 Vergenceeyemovement 15

Figure

2-5

Experimentalconditions for sufficiencyof retinalinformation 21

Figure2-6 Theeffect of

freeing

theheadonfixation stability

25

Figure2-7 Theeffects of

freeing

theheadonsaccades 26

Figure2-8 _{Blocks copying}task 27

Figure

2-9

Eyemovement strategies usedfor blocks copyingtask 28

Figure2-10

Tapping

vs.

Looking

_only 30

Figure 2-1 1 Fixationdurations forexpert andnovice

drivers,

forseveral conditions

33

Figure2-12 Fixationdurationasafunctionoftask

difficulty

for

a

driving

task 34

Figure3-1

Portable,

wearable_{eye-tracking headgear}

46

Figure3-2 Opticsmodule

(HMO)

ofheadgear

47

Figure

3-3

Top

view ofheadgear

47

(11)

Figure 3-6 _{Close up}

front

panel of controlunit 50

Figure 3-7 _{Subject wearing}

eye-tracking

_gear, _readytoperform an experiment 51

Figure 3-8 Imageofpupil

(white

_outline)andcorneal reflection(blackoutline) 52

Figure

3-9

Calculationofthe

line

ofgaze

53

Figure3-10 Diffractionpatternusedforcalibration

55

Figure 3-11 RealData Capture 57

Figure 3-12 Traceof vertical eye position 58

Figure

3-13

Traceofhorizontaleye position 58

Figure3-14 Expandedviewof vertical eye position 58

Figure

3-15

Expandedview ofhorizontaleye position . 58

Figure3-16 Eye-tracker noise,no_averaging :

59

Figure3-17 Eye-tracker noise,2

field

_averaging :

59

Figure3-18 Eye-tracker noise,4

field

_averaging

59

Figure

3-19

Eye-tracker noise, 8 field averaging

59

Figure3-20 Calculationof visual anglefrom fieldof view

62

Figure 3-21 Thegamma

density

functionwithB = ₁

andA = ₂

65

Figure

4-1

Simulatedcondition_{set-up for}Exp. 1

68

Figure

4-2

First

hallway

for Exp. 1

69

Figure

4-3

Second

hallway

forExp. 1

69

Figure

4-4

Third

hallway

forExp. 1

69

Figure

4-5

Fourth

hallway

for

Exp. 1

69

Figure

4-6

Simulated/Active 70

(12)

Figure

4-8

Real/Active 70

Figure

4-9

_Real/Passive 70

Figure

4-10

Fixation Durations- Simulated/Active

-JB 71

Figure

4-11

Fixation Durations

-Simulated/Active- JP 71

Figure

4-12

Fixation Durations

-Simulated/

Active- MA 71

Figure

4-13

Fixation Durations - Simulated/Active- RC 71

Figure

4-14

Fixation Durations- Simulated/Passive- JB 72

Figure

4-15

FixationDurations- Simulated/Passive- JP 72

Figure

4-16

FixationDurations - Simulated/Passive- MA

72

Figure

4-17

FixationDurations- Simulated/Passive- RC 72

Figure

4-18

FixationDurations

-Real/Active

-JB

73

Figure

4-19

FixationDurations- Real/Active- JP

73

Figure

4-20

FixationDurations

-Real/Active- MA

73

Figure 4-21 FixationDurations - Real/Active- RC

73

Figure

4-22

FixationDurations

-Real/Passive

-JB 74

Figure

4-23

FixationDurations - Real/Passive

-JP 74

Figure

4-24

FixationDurations - Real/Passive- MA

74

Figure

4-25

FixationDurations - Real/Passive- RC

74

Figure

4-26

FixationDurations - Strings_ofLength_x

-Simulated/

Active-JB 77

Figure

4-27

FixationDurations - Strings

ofLengthx- Simulated/Passive

-JB 77

Figure

4-28

Fixation Durations- Strings

ofLengthx-

Simulated/

Active- JP

77

Figure

4-29

FixationDurations - Strings_ofLength_x

-Simulated/Passive- JP

77

Figure

4-30

FixationDurations- Strings

ofLengthx

-Simulated/

Active- MA

77

Figure

4-31

Fixation Durations- Strings_ofLength_x

-Simulated/Passive

- MA

(13)

Figure

4-32

Fixation Durations

-StringsofLengthx- Simulated/Active- RC 78

Figure

4-33

ofLengthx- Simulated/Passive- RC 78

Figure

4-34

ofLengthx Real/Active- JB 78

Figure

4-35

Fixation Durations - Strings

ofLengthx- Real/Passive- JB 78

Figure

4-36

Fixation Durations

-StringsofLengthx Real/Active- JP 78

Figure

4-37

ofLengthx- Real/Passive

-JP 78

Figure

4-38

ofLengthx

-Real/Active- MA 78

Figure

4-39

Fixation Durations - Strings

ofLengthx- Real/Passive

-MA 78

Figure

4-40

FixationDuraitons

-StringsofLengthx

-Real/Active- RC 78

Figure

4-41

FixationDurations - Strings_ofLength

x- Real/Passive- RC 78

Figure4-42 SaccadeSize- Simulated/Active- JB 80

Figure

4-43

Saccade Size

-Simulated/Active

-JP 80

Figure

4-44

Saccade Size

-Simulated/Active- MA 80

Figure

4-45

SaccadeSize- Simulated/Active- RC 80

Figure

4-46

Saccade Size- Simulated/Passive- JB 81

Figure

4-47

SaccadeSize- Simulated/Passive- JP 81

Figure

4-48

SaccadeSize - Simulated/Passive - MA 81

Figure

4-49

SaccadeSize- Simulated/Passive- RC 81

Figure

4-50

Saccade Size - Real/Active- JB 82

Figure

4-51

Saccade Size- Real/Active- JP

.82

Figure

4-52

Saccade Size- Real/Active- MA 82

Figure

4-53

SaccadeSize- Real/Active- RC 82

Figure

4-54

SaccadeSize- Real/Passive- JB

83

Figure

4-55

SaccadeSize- Real/Passive

(14)

Figure

4-56

Saccade Size

-Real/Passive - MA

83

Figure

4-57

Saccade Size

-Real/Passive- RC

83

Figure

4-58

Fixation Durationvs. Saccade Size

-Simulated/Active- JB 86

Figure

4-59

Fixation Durationvs. Saccade Size- Simulated/Passive

-JB 86

Figure

4-60

Fixation Durationvs. Saccade Size- Simulated/Active- JP 86

Figure

4-61

Fixation Durationvs. Saccade Size- Simulated/Passive- JP 86

Figure

4-62

FixationDurationvs. Saccade Size- Simulated/Active- MA 86

Figure

4-63

FixationDurationvs.Saccade Size- Simulated/Passive- MA 86

Figure

4-64

FixationDurationvs.Saccade Size

-Simulated/Active- RC 86

Figure

4-65

-Simulated/Passive- RC 86

Figure

4-66

FixationDurationvs.Saccade Size- Real/Active- JB

87

Figure

4-67

Fixation Durationvs.Saccade Size- Real/Passive- JB 87

Figure

4-68

FixationDurationvs.SaccadeSize

-Real/Active

-JP 87

Figure

4-69

FixationDurationvs.Saccade Size- Real/Passive- JP

87

Figure

4-70

Fixation Durationvs.SaccadeSize-Real/Active- MA 87

Figure

4-71

-Real/Passive- MA

87

Figure4-72 Fixation Durationvs.SaccadeSize- Real/Active

-RC 87

Figure

4-73

Fixation Durationvs.Saccade Size- Real/Passive- RC

87

Figure5-1

Set-up

forCondition 1: SmallScreen 91

Figure5-2

Set-up

forCondition2: LargeScreen 92

Figure

5-3

Set-up

forCondition3: Real

Walking

92

Figure5-4 Fixation_{stability for}targetonrightforsubjectJB 94

Figure

5-5

Fixation_stability

for

targetstraightahead

for

subjectJB 94

(15)

Figure

5-7

Fixation

stability for

targetstraight ahead

for

subjectJK 96

Figure 5-8 Fixation

stability for

small screencondition 97

Figure

5-9

Fixation

stability for large

screencondition 97

Figure5-10 _{Fixation stability}

for

real_walkingcondition 97

Figure

6-1

Men'swashroom used

for

Exp.

3

101

Figure

6-2

Women'swashroomused

for

Exp.

3

101

Figure

6-3

FixationDurations

-Handwashing

- SubjectAS 102

Figure

6-4

FixationDurations

-Handwashing

- SubjectJB 102

Figure

6-5

FixationDurations

-Handwashing

-Subject JP 102

Figure

6-6

FixationDurations

-Handwashing

- Subject MH 102

Figure

6-7

FixationDurations- Not

Including

Wash Hands- Subject AS

103

Figure

6-8

FixationDurations - Not

Including

Wash Hands- SubjectJB

103

Figure

6-9

FixationDurations

-Not

Including

Wash Hands

-Subject JP

103

Figure

6-10

FixationDurations - Not

Including

Wash Hands- Subject MH

103

Figure

6-11

Saccade Size

-Handwashing

- Subject AS

105

Figure

6-12

Saccade Size

-Handwashing

- SubjectJB

105

Figure

6-13

Saccade Size

-Handwashing

- SubjectJP

105

Figure

6-14

Saccade Size

-Handwashing

- SubjectMH

105

Figure

6-15

SaccadeSize- Not

Including

WashHands- Subject AS

106

Figure

6-16

SaccadeSize- Not

Including

Wash Hands- SubjectJB 106

Figure

6-17

SaccadeSize- _Not

Including

WashHands- SubjectJP

106

Figure

6-18

Saccade Size - Not

Including

WashHands- Subject

MH 106

Figure

6-19

Handwashing

- SubjectAS

108

Figure

6-20

Handwashing

- Subject JB

(16)

Figure

6-21

Handwashing

-SubjectJP 108

Figure

6-22

Handwashing

-SubjectMH 108

Figure

6-23

Handwashing

-Subject AS

109

Figure

6-24

Handwashing

- SubjectJB

109

Figure

6-25

Handwashing

-SubjectJP

109

Figure

6-26

Handwashing

- SubjectMH

109

Figure

6-27

Elapsedtimebetweenobject

fixation

and objectmanipulationfor

Handwashing

-SubjectAS Ill

Figure

6-28

Elapsedtimebetweenobjectfixationand object manipulationfor

Hanchvashing

- SubjectJB

Ill

Figure

6-29

Elapsedtime

between

object

fixation

andobjectmanipulationfor

Hanchvashing

-SubjectJP Ill

Figure

6-30

Elapsedtime

between

objectfixationand objectmanipulationfor

Handwashing

- SubjectMH Ill

Figure

6-31

JB'sfirst fixationon garbage can 112

Figure

6-32

JB's

first

manipulation of garbage_{can, 7.1}secondslater 112

Figure

6-33

Handwashing

- SubjectAS 114

Figure

6-34

Handwashing

- SubjectJB 114

Figure

6-35

Handwashing

-Subject JP 114

Figure

6-36

Handwashing

- Subject

MH 114

Figure 7-1 Thealcove withthefour vendingmachines usedin Experiment4 117

Figure 7-2 Coffeemachine 118

Figure

7-3

Sodamachine 118

Figure 7-4

Candy/chip

machine 118

Figure

7-5

Sandwichmachine 118

Figure7-6 Fixation Durations

-Vending

Machine- Subject AS

(17)

Figure 7-7 Fixation

Durations

-Vending

Machine- SubjectJB

, 120

Figure 7-8 Fixation

Durations

Vending

Machine- SubjectJT 120

Figure

7-9

Fixation

Durations

Vending

Machine- SubjectMH 120

Figure 7-10 Saccade Size

-Vending

Machine- SubjectAS 122

-Vending

Machine- SubjectJB 122

-Vending

Figure

7-13

Saccade Size

-Vending

Machine- SubjectMH 122

Figure 7-14

Vending

Machine- SodaMachine- SubjectAS 124

Figure 7-15

Vending

Machine

-SandwichMachine- SubjectJB 124

Figure 7-16

Vending

Machine

-Candy/Chip

Machine- Subject JT 124

Figure 7-17

Vending

Machine- CoffeeMachine- Subject

MH 124

Figure7-18

Vending

Figure

7-19

Vending

Machine

-Subject JB 126

Figure7-20

Vending

Figure 7-21

Vending

Machine- Subject MH

126

Figure7-22 Elapsedtimebetweenobjectfixationand objectmanipulation

for

Vending

Machine- Subject AS 128

Figure

7-23

Elapsedtimebetweenobjectfixationand object manipulationfor

Vending

Machine

-Subject JB 128

Figure 7-24 Elapsedtimebetweenobjectfixationand objectmanipulation

for

Vending

Machine

-SubjectJT 128

Figure

7-25

Elapsedtimebetweenobject

fixation

and object manipulation

for

Vending

Machine- SubjectMH

128

Figure7-26

Vending

Machine- Subject AS

130

Figure7-27

Vending

Machine- Subject JB

130

Figure7-28

Vending

Machine- Subject JT

(18)

Figure

7-29

Vending

Machine

-SubjectMH 130

Figure 7-30 JB

looks

at exit of

candy/chip

machine attimecode of00:08::05:02 132

Figure 7-31 JBretrieves purchase

from

machineattimecode of

00:08:11:26,

6.8

seconds

later

132

Figure7-32 MH

looks

at coffee_cup

lids

at atimecodeof00:10:03:27 132

Figure

7-33

MH

looks

atcoffee_cup

lids

at atimecodeof00:10:40:10 132

Figure 7-34 MHretrieves coffee_{cup lid}atatimecode of

00:10:56:27,

53

seconds after

first

locating

the

lids,

and16.

5

secondsafter a secondlooktothelids 132

Figure7-35 FixationDurationsandSubtasks

-Vending

Figure 7-36 FixationDurationsandSubtasks

-Vending

Figure7-37 FixationDurations andSubtasks

-Vending

Figure7-38 FixationDurations andSubtasks

-Vending

Machine

-SubjectMH 134

Figure

7-39

FixationDurations andSubtasks

-Handwashing

- Subject AS

135

-Handwashing

- Subject JB

135

Figure 7-4 1 FixationDurationsandSubtasks

-Handwashing

-SubjectJP 1

35

-Handwashing

- SubjectMH 1

35

Figure

7^43

Saccade Size andSubtasks

-Vending

Machine- SubjectAS

136

Figure 7-44 Saccade SizeandSubtasks

-Vending

Figure7-45 SaccadeSizeandSubtasks

-Vending

Macliine- SubjectJT

136

-Vending

Machine- SubjectMH

136

-Handwashing

Subject AS

137

Figure7-48 Saccade SizeandSubtasks

-Handwashing

- SubjectJB

137

Figure

7-49

Saccade Size andSubtasks

-Handwashing

- SubjectJP

137

-Handwashing

- SubjectMH

(19)

List

of

Tables

Table2-1 Somecomputationscanbesimplified

by

_makingassumptions

aboutbehavior

9

Table 3-1 Exampleofcodefrom eye-trackingexperiment

61

Table7-1 Comparisonof

fixation

durations for

handwashing

and_vendingmachine ....123

(20)

1.

Introduction

1.1

Overview

Thepurpose of vision is to servetheneeds oftheindividual. As anindividual goes about

performing

day-to-day

activities,thevisualsystemis_{continually monitoring}theenvironment

to provide information about each

interaction;

information that enables meaningful

interactionswiththatenvironmentfortheMfillmentof a plan of action.

Inthis _sense,vision isnot a passive process_wherebyinformation_{is merely}_collected,

processed, and stored for later retrieval, but rather an active process that integrates

goal-oriented behavior with proprioceptive signals

from

the individual's physical _state, and

exteroceptiveinformationaboutthelayoutoftheenvironment.

Visual perception is _essentially a selective process. The particular sequence of

selections is

largely

dependent uponthe task to

be

performed, and as such is

driven

by

the

goals of the

individual,

but each discrete selection occurs _mostiyat a subconscious

level.

Eyemovements areone of themechanisms

by

which theselection process proceeds. The

(21)

angleissurrounded

by

a

low

resolution periphery. Aneye movementisrequiredto

bring

an

object ofinterestto the

fovea,

andis themeans

for

_sustainingovertattentionontheobject.

The apparent purpose of eye movements thus appears to be that of _allowing

for

the

impressionof a

broad,

high

resolutionvisual

field from

multiplesequential fixations. This

observation,

based

ontheverifiable_physiologyofthehumaneye, doesnot_adequatelyoffer

an answer to a central question _regardingthe role of eye movements in visual perception:

whereisattentiontobe

focused

next?

Thisresearcheffortis

largely

concerned with_providingaframeworkwithin whichthat

question_{may be}approached. Thereis _obviouslynot a single answerthatwill_apply to_every

situation _requiring

focused

visual_attention, butit is possibletoextract a certain amount of

commonalityineverydaytasks thatgives risetoparticularpatternsof selection.

A primaryobjective ofthis researchis to_studyeyemovemeris of subjectswhile

they

perform _everyday tasks is a natural environment. Much of what is _currentiy understood

about human eye _movements, and also about visual perception in general, is based on

psychophysicalstudiesconductedin theconfines of a

laboratory

setting. Since

humans

did

not evolve their sensory-perceptual abilities in such a restricted_{environment, it is} valid to

question whether ornot theresults obtainedfromsuch studies_applyina practical sense. It

is also possible that subjects _may exhibit an unconscious bias while

performing

in a

laboratory,

providingresults that are valid in the

laboratory,

but not _necessarily so in the

world outside ofit.

The psychophysical "black-box" approach to _studying eye movements in

isolation,

and notinthecontextof arich, interactiveenvironmentsuffers

from

othermethodological

(22)

input can

be

isolated

from

all other possible

inputs,

its effect on the outcome can

be

precisely

measured andquantified. Allsuch

inputs,

whentaken_{together, describe}thesystem

response. Inthis_context,

linearity

refersto theideathat thewholeisequalto thesumofits

parts. The assumption of

linearity

as applied to human eye movements has not been

adequately shown to

be

valid. A _secondary goal of this research project is to provide

grounds either

for

or against thatassumption. Themeans for

doing

this is provided

by

the

RIT portable, wearable _eye-tracker, which was developed

for

the purpose of _smctying

subjects'

eye movements while

diey

are _performing _common, _everyday tasks in a _natural,

unrestrictedenvironment.

A portable, headmounted eye-trackerwas usedforthis _research, aswell ashardware

andsoftwarethatenabled a computationoftheline-of-gazeforasubjectwhois_wearingthe

eye-trackerand_performingtasksina natural environment. Theline-of-gazeisdisplayedas a

cursor superimposed on a video scene of the environment as seen

by

the subject. Data

analysis ofthecursorposition as afunctionoftimecorrelatestoeye movementsand affords

anindirectmethodof

determining

thecognitive processes_underlyingvisualperception.

A final objective of this research was to consider the implications of a sequential

fixation strategyand ofnon-uniformsampling, or

foveation,

for

an artificial vision system.

Researchers inrobotics andcomputer vision often consider the

human

visual system as a

model

for

artificial vision systems. It would

be

beneficial to be able to

describe

the

high-levelcognitive processesunderlyingvisualperceptionin a_waythatwould beamenabletoa

computer program. Activecomputer visionis an area of current research, and much work

(23)

thosealgorithms.

In summary, the objectives of this research project are three-fold: to conscbrthe

effects of

freeing

thesubject

from

therestraints ofthe

laboratory

_setting

during

eye-tracking

experiments, to

develop

a

framework for

describing

the temporal _sequencing of fixations

acrosstasteaswellaswithin_tasks, andtoevaluatetheappropriatenessof such aframework

for

_servingas a modelforan artificialvision system. Sincethepurposeofvisionis toserve

theneeds ofthe

individual,

itseemsreasonabletoconcludethatahypothesisabout whereto

looknext canbest be

formulated

by

_consideringthecooperative_{relationship between}vision

(24)

Following

arethe objectivesmandated

for

thisresearch:

a)

Conduct

a

literature

review ofthesubject. Thetopicsrelatedto the topicare: eye

movements, visuo-motor_{coordination,} selective_attention,plan_schemata,active_vision,

and animate vision.

b)

Designa series of experimentstomonitor

subjects'

eye movements as

they

performa

range of _common,

everyday

tasksselectedtogain an_{understanding}oftheinteraction

betweenvision and action. Suchtasksinclude:

i)

Walking

alonga_corridor,

being

pushedina_wheelchair, and_watching avideotape

of someone_{walking along}acorridoror

being

pushedinawheelchair

ii)

Maintaining

fixation

onan object while_{walking along}a corridor

iii)

Washing

one's

hands

ina

lavatory

iv)

Making

a selection

from

a_vendingmachine

c) Recruitsubjects and_carryouttheexperimentation.

d)

Analyzethedatacollectedintermsof eye movementmetrics. Examplesof such metrics

are: fixation

duration,

number of

fixations,

saccade

length,

saccades per_second,etc.

e)

Study

theresultstodeterminethepre-conscious strategiesused

by

individualsas

they

performedthetasks.

f)

Modify

andrepeattheexperimentationand

data

analysis as_necessaryto_{investigate any}

interesting

or emergent patternofoculomotorbehavior.

g) Formulateconclusionsbasedon results. Demonstratetheusefulness of_results,and

(25)

2.

Background

2.1

Historical Perspective

In 1867 Herman von Helmholtz published

his

thoughts on the nature of visual

perceptioninabookentided TrmtiseenFhysbtgiccdCpfc

(Helmholtz,

1867/1925). This

work laid the

foundation

for the classical approach to the philosophical treatment of

vision known as constructivism. The goal of constructivism wasto explain visual

perception as _arising

from

the confluence of_{many local} information _processing_units,

which when combined_together, constructaglobal percept oftheworld. Acentral tenet

of modem constructivism is the belief that perception relies upon a process of

unconsciousinference. Inother_{words, in}order

for

localinformation tobe

bound

with

otherlocal information in ameaningfulway,aninferencemustbemade aboutthemost

likely

interpretation.

Anexample ofhowunconsciousinferencecould

be

used toexplain perceptionis

shownin Figure 2-1. Twopossibleinterpretationsofimage A are shown asimage Band

(26)

inference to explain the

human

perceptual

bias

of _{choosing image B} as the correct

interpretation. Image B ischosen

because

it isthemost

likely

possibility.

^

*

ABC

Figure2-1. Hiimanshavea perceptualbiastoward_seeingthetriangleas whole.

The inference is

largely

unconscious in that the observer is _generally not aware that

probabilities are

being

_compared,andthatlogicalinferencesate

being

made.

A constructivist approach to theinverseproblem- _that

is,

theproblemof how

2-D retinal images are tiansformedinto aperception ofthe3-D environment

-would

betoconsiderthe2-D retinalimageas

belonging

to themost

likely

state of affairsinthe

environmentthatwould giverisetosuchanimage.

In contrast to the constnictivist

theory

of unconscious

inference,

an ecological

perspective was espoused

by

James Gibson

(Gibson,

1966),

who argued that direct

perception oftheenvironmentissufficient

for

_solvingtheinverseproblem. He

believed

that all visual perception is the result of the interaction between the observer and

surfaces, or more _specifically the light reflected off _surfaces, in the environment.

Surfaces are composed of texture _elements, and it is the structure that exists in the

surfacesthat in turnstructuresthelight that reachestheeye oftheobserver. When the

observermoves aroundthe_surfaces, the_changingambient optic _arrayof

light reaching

(27)

Thus,

the inverse problem is solved

by

_considering the movement ofthe observer as

integral to the reconstmction. Change in structure over time supplies the _missing

dimension.

In the

late

1970's David

Man-(Marr,

1982)

combined the theoretical constructs

from both

constructivism and ecological perception to create die

first

computational

approach

for

describing

visualprocesses. Heusedmathematicaltechniques to

develop

computer programs that simulated biological _vision, and led the _early efforts of

computational and computer scientists whodesignedthefirstmachine vision systems.

Marr disagreedwith

Gibson,

however,

on the issue of representation. Gibson

held

that theenvironmentis the_repository

for

all oftheinformationthatis_{necessary for}

visual

interaction,

whereasMarr believedthat theexternal worldisrepresented

internaljy,

inall ofits detail. Anexample oftheinternal representationiswhatMarr callsthe "2V2

dimension"

sketch, an internal retinotopic image with the potential for a 3-D

representation.

Marr's work has had a _strong influence on the current _{understanding} of _early

vision, and this_{understanding has led} toa numberofcomputational approachesbased

on_early, orlow-levelbiologicalvision. It isassumedthatinorder tosimulate a process

as complex as high-level visual _perception, one must

begn

_with, and _correcdy

implement,

thelowerlevelprocesses.

Only

thenwill the "correct"_waytoimplementthe

higher-levelcognitive processesbecomeapparent.

Ballard and Brown pointed out several weaknesses to this approach

(Ballard

&

Brown,

1992).

First,

_earlyvisual processes

do

not take intoaccount the motivation of

the observer. Marr's treatment of the visual process as

purely

passive precludes a

(28)

of cognition as the

driving

force behind

thecollection oflow-level

information,

instead

of

thinking

ofitas

merely

theresult of a collection ofresponses.

Second,

the

early

vision approach

does

nottakeintoaccount sequentializationand

gaze controlthat

humans

usetomake efficientuse ofthemulti-resolution capabilitiesof

the

human

eye.

Finally,

Marr's model does not make use of

learning

strategies or

adaptational responsesto theenvironment. Hismodel ofperceptionis_essentiallya

rich,

highly detailed,

task independent

description

of the _world, which is continually

being

called upon

by

cognition

for

_performing specific tasks. Ballard and Brown

(1992)

describe an alternate _way of _approaching the complexities imposed

by

vision, and

suggest numerous simplifications that would result from

taking

behavioral assumptions

intoaccount. Their

findings,

whichare exemplified

by

aconstructcalled animate_vision,

aresummarizedinTable2-1 below.

ComputationsSimplified

by

Behavioral Assumptions

Agent's Behavior Behavioral Assumption

Shape from shading Lightsourcenot

direcdy

behindviewer

Timeto_adjacency _{Rectilinear motion;}gazeinthe

direction

of motion

Kineticdepth Lateralheadmotionwhile

fixating

a pointina

stationaryworld

Color

homing

Targetobjectisdistinguished

by

itscolor spectrum

Optic

flow

Texture-richenvironment

Stereo depth Systemcan

fixate

environmental points

Edge

homing

Targetpositioncan

be described

by

approximate

directions

from

texturein itssurround

Object

tracking

Vergence canbeusedtoimprove

tracking

performance

(29)

Another objection to the early-vision approach toward computational vision is

suggested

by

thework conducted

by

Yarbus inthe 1960's. Yarbus showedhow high-level

cognitive events are reflected in the patterns ofeye-movement traces

(Yarbus,

1967). He

found

that

different

patterns of eye-movement

traces,

or _scan-paths, couldbeelicited

from

subjects when

they

performed context-sensitive tasks. _{For example,} when subjects were

shown a

painting

depicting

a scene of several people _greeting an unexpected _visitor, a

specific question posed to the subjects elicited a specific "signature" pattern of eye

movements. Different questions elicited different "signature"

patterns. Figure 2-2 below

showsthe_paintingandtypical scanpathsforasubject

formulating

an answerto thevarious

questions.

Original painting

Free_viewing

f;

m

HO?

Estimatethe economiclevel

ofthepeople

iiJK

\S

cy^^^^t:

\!

Judgetheirages Guesswhat

they

had

been

Remembertheclothes worn

doing

beforethevisitor's

by

thepeople

arrival

Figure2-2. Scanpaths aretaskdependent From_Palmer,₁₉₉₉and_Yarbus,1967.

The observation that oculomotor

behavior

is

largely

task

dependent leads

one to

(30)

oftheobserver. David Lee

has

suggestedthat_{information processing}

by

humansshould

be

considered inthecontext of a unified perceptuo-motor_system,which is itselfa part ofthe

organism-environment system

(Lee,

1978,

1980). His _{ideas pertaining} to the functions of

vision are an extension of the ecological perceptual model set

forth

by

Gibson a decade

earlier. In

his

view, the

human

visualsystem must

be

studiednot _onlyin anenvironmental

context, but also in the context of the individual's sensory-motor system. Vision is

functionally

inseparable

from

the motor system. Information becomes available to the

individual via three separate sources: _{exteroceptive, propioceptive,} andexprcprioceptive.

The exteroceptive source delivers information about the layout and affordances of the

environment. The proprioceptive source delivers information about the position,

orientation, and movement ofthe

body

or parts ofthebody. The exprcprioceptive source

delivers information about the union of the exteroceptive and proprioceptive _sources,

information about the movement of the

body

relative to the environment. The

exproprioceptive information represents the interaction between the individual and space

overtime.

Taken together, thethreesources provide the meansforacooperative_relationshipto

existbetweenvisionandaction. Goal-oriented

behavior,

planning, and

decision-makingall

playasignificant partinthevisualperceptionexperienced

by

theindividual.

To summarize the

history

of

formal

thinking

about the nature of human visual

perception, the constructivist and computational early-vision approaches taken

by

Helmholtz and Marr emphasize the _autonomy of the individual and unconscious

mechanisms toguide thevisual perceptual process. This is the

foundation for

much ofthe

current linear-systems methods for

teasing

apart the

factors

that influenoe perception.

(31)

the interaction

between

the individual and the environment _according to _{goals, actions,}

motivation and

behavior.

For

them,

trying

to understand howvision works

by

_studying

subjects'

responsestoartificial stimuliina

laboratory

_{setting is like}

trying

tounderstandhow

fish

swim

by

putting

theminasandbox. Fromthispoint ofview_then, thefactors thathave

had a major

evolutionary

influenceon vision and that have

largely

shaped human visual

perception are_preciselythose

factors

thatare_{missing from}the

laboratory

setting.

2.2 Eye

Movements

The binocular visual

field

subtends an area _{approximately} 130 _vertically and 180

horizontally. Most ofthat areacontains low-resolutionperipheralinformation. Inorderto

obtain

detailed,

high-resolution information

from

different areas in the _environment, the

eyesmust move.Thepurpose of an eye movementisto

bring

themost_visuallyrelevant part

of a scene onto the area ofthe retina with the highest visual _acuity, and to

keep

it there

during

focused attention. This area is called the

fovea

and subtends _{approximately} one

degreeof visualangle,coveringan area ofthevisualfield approximately equalto thesizeof a

thumbnailextended at an ami'slength. Attentioncanthenbere-deployedtoanother areain

thevisualfieldtoinitiatethenext eyemovement.

Thephotoreceptors ofthehumaneyeconsist ofbothrods and_cones,thecones

being

thephotoreceptors responsible

for

colorperception andvisual acuity. As shownin Figure

2-3,

thepopulation of conesishighestinthe

fovea

and

falls

off_rapidlytoward theperiphery.

There is a 1:1 or greater correspondence

between

photoreceptors and ganglion cells in

die

fovea,

however this ratio increases _continuously_alongtheperiphery. This

fact,

along

with

the higherconcentration ofcones in the

fovea,

accounts

for

the

higher

visual

acuity

there.

(32)

M *

150,000

*7

IK

Cones

80 60 40 20 0 20 40 60 80

VisualAngle(degrees from_fovea)

Figure 2-3. Thedistributionofrods and cones_{is unevenly distributed}acrosstheretina.

Thefoveacontainsthehighestconcentration of_cones,forhighvisual_acuityhi thatregion.

From_Palmer,1998.

Traditionally,

eyemovementshave beenclassifiedintosixcategories:

1. Miniatureeye movements

- These

arethe_onlytypeof eye movementsthatdo nothavea

selectivefunction.

They

includetremorsintheextraocular musdesthatcontrolrotation

oftheeyes in theirspherical _socket, drift ofthe foveated

image,

andmicrosaccades to

bring

thedriftedimagebackto the fovea. The result is constant motion ofthe optical

imageontheretina.

2. Saccades Theseare high _velocity,

ballistic

eye movementsthathavethe

function

of

bringing

images of objects ofinterest to the fovea. It is_{generally believed}that once a

saccadic eye movement has

begun,

it cannot be altered. A typical saccade takes

approximately 150 - 200

msec to planand _execute; _planning takesabout 150 msec on

average, andtheduration oftheeyemovement _{is approximately 20} msec plus2 msec

perdegree ofvisual angle

(Carpenter,

1988). Saccades can reach velocities _up to 600

per_second, and individuals

typically

make

3

or

4

saccades per_second,

depending

onthe

(33)

Studies

on eye movements

during

reading have

shown that saccades

during

reading

are

typically

seven

letters

long,

which isa saccade

length

of

between

1 and 2

for reading

standard size textat a

distance

of

40

cm

(O'Reagan,

1990). Ithas also

been

found

that thereis a wide

distribution

of within-wordtarget

landing

for

_readingtext. In

other_words, thereisno precise position within thewordthat theeyetargets thesaccade

to

land

_on, anywhere within theword is sufficient

for

comprehension

(Morgan,

et. _al.,

1990). Fixationsare

defined

as the timebetween successive _saccades; a typical

fixation

duration for

_readingis

between

200and300msec.

3. Smoothpursuit- These

eye movements track thepositionofa_moving_object,withthe

purposeof

keeping

theimageinthe

foveal

region.

Ideally,

theimageremains_stationary

on theretina. Afteran initialsaccade to track the_moving_object, theeye movement is

smooth and_continuous, as opposedto theabruptnessofsaccades. Constantcorrection

ofimage position on the

fovea

is maintained

by

means of a

feedback

signal from the

brain that senses the position of the object as it moves. Thus smooth pursuit cannot

usually

be

maintained in the absence of a _moving target. The maximum _{velocity is}

approximately 100

per _second; targetvelocities higher than that cause retinal slippage

anddisable the

tracking

mechanism.

During

_pursuit, theimageofthepursued objectis

clear,withall otheruntrackedobjectssmeared

due

to theirrelative motionontheretina.

4.

Vergence Whenan observer

fixates

an_object,theeyes convergetowardone_another,

withthedegreeofconvergence

depending

uponthe

distance between

the observer and

the object. Vergenceeye movements are

disconjugate

in thesensethat theeyes rotate

opposite to one another. Fora conjugate movement such as_pursuit, theeyes rotatein

thesamedirection. Ifanobjectis moving both in

depth

andin

direction,

a

disconjugate

(34)

Figure 1A. Vergenceeye movement

5. Vestibular- When_the

head

rotates, thevestibular ocular reflex

(VOR)

allows usto

fixate

an object in the environmentwithout visual feedback. The information _necessary to

control eyemovements when theheadmoves originates inthevestibular system ofthe

inner ear,whichsensestheorientation ofthehead. Vestibular eyemovements arefaster

than pursuit movements,

however,

high velocity head movements such as those

encountered while _running or _walking

fast

cannot be

fully

compensated

for

by

a

vestibular eye movement

(Palmer,

1999). When this

happens,

objects in the

environmentthatrequire highvisual_{acuity for}perception(suchas

lettering

onsigns)will

appearblurred.

6.

Optokinetic

-j\noptokinetic eye movementisa responseto therapidtranslationofthe

entire visual

field,

or alargepartofit. For example, ifan observeris

looking

througha

window at a train _passing

by,

fixating

and

tracking

a spot on the train will cause the

observer to exhibit the optokinetic response. It is characterized

by

a _slow,

tracking

phase in which the image is stabilized on the _retina, followed

by

a _rapid, saccade-like

snap of the eyes in the direction opposite to the image motion. This is

known

as

(35)

Recent studies

have

suggested that there are _{actually only} two categories of eye

movements: saccadic and smooth

(Steinman,

Kowfer,

and

Collewijn,

1990). Theclaimis

that the classificationintosix categoriesis_artificial,aresult ofthe _early

laboratory

methods

that studied simple tasksinaconstrainedand sparsevisualenvironment. The experimental

results of such_early studies reflected the

low-level

and

involuntary

aspects of oculomotor

control, and were

simply

responses to _sensory cues that did not reflect the cognitive

processes that people

typically

_employ while engagedin natural tasks such as _expectation,

motivation,and

learning.

2.3

Visual

Attention

and

Selection

The mechanics of oculomotor

behavior

do not explain how the selection process is

controlled. Questionssuch as "what is theregion of

interest?",

and "where shouldthenext

fixation

be?"

can best be answered within a

framework

that considers the purpose of

focused

attention.

2.3.1

Saliency

Maps

The notion of a _{saliency map}was proposed to define the _{relationship between} the

components of ascene_accordingto theirrelativeimportanceto theobserver

(Mahony

and

Ullman,

1988).

According

to this_theory,thevisual systemperformsaninitial

low-frequency

parsing of the environment to

identify

potential regions of

interest,

and assigns to each

region aweight _according to its saliency.

Corners,

high

luminance,

and

bright

_colors,

for

example,wouldbeassignedahighsalientweight. This infonnation is recordedina_mapof

theenvironment,whichisarecordoftheweightof each region. The map is

dynamic

inthe

sensethatrecenttargetsaredepressedastheindividualmoves aroundintheenvironmentto

(36)

2.3.2 Feature Integration

Theory

ofAttention

What is the purpose of

focused

attention?

According

to

feature

integration

theory,

elementary

features

inthe environmentsuch as color and shape are processedbeforeobjects

that require a conjunction of several

features,

such asa

blue box

or a_{gray kitten.} Focused

attention is

necessary

to conjoin the separate

features,

which then enables proper

identificationoftheobject

(Treisman

&

Gehde,

1980).

The studies Treisman and Gelade conducted were based on the experimental

paradigmknownasvisualsearch. In this_paradigm, theamountoftime ittakes tocomplete

a search is plotted as a function of the number of items to be searched. A flat response

indicates a

fast,

parallel _process, whereas a linear response indicates a slower sequential

process. Sinceeye movementsare

inherently

_sequential, atask thatrequireseye movements

would elicit longersearchtimesforalargernumberofitemsand alinearresponse.

Theexperiments weredesignedtodistinguish between

features

thatare_elementary,or

integral,

and

features

that areseparable and requirefocusedattentionforintegration.

They

hypothesized that an integral feature would elicit a flat search response and wouldexhibit

"pop-out"

ina

field

of

distractors,

whereasan object with separablefeatureswouldrequire a

linear search response. Their results showed this to be the case when the _elementary

featureswere chosentobecolors or shapes

(

for example "pink" ina

field

of "brown" and

"purple"

distractors,

or "O" ina

field

of "N" and "T"

distractors)

andthe separable

features

were chosentobeaconjunctionofthe two_elementary

features

(

such as"pink O"

ina

field

of"green O" and"pinkN" distractors).

Boththe_{saliency map}

theory

andthefeatureintegration

theory

describe

perception as

being

theresultoflow-levelandearly-vision processes. Oculomotor

behavior

isa response

(37)

2.4 The World

as

Anchor

2.4.1

Semantic

Consistency

When subjects are shown a

line

drawing

of a natural scene that contains either a

semantically

consistent object

(a

tea

kettle

ina

kitchen)

or a_{semantically inconsistent} object

(a

microscopeina

kitchen),

they

are quickerto

locate

the consistent_object, when asked to

search

for

it,

than

they

are theinconsistentobject

(Henderson,

et al, 1999).

Moreover,

the

initial saccade_{is equally}

likely

to

be

to theconsistent object as itis to

be

to theinconsistent

object. Since the inconsistent object would seem to have a higher salience than the

consistent_object, the_{saliency map}

framework

_{for early}visual_processingis either_wrong or

incomplete. A

determination

of semantic _{consistency necessarily} takes into account the

relevancyofa particular object _{in its surroundings,} andthisis not considered as part ofthe

saliency mapmodel.

2.4.2 Change Blindness

Changeblindness refers to thephenomenon that occurs when

large-scale

changes in

the visual scene goundetected

by

theobserver as theresult of a

blink,

a _saccade, or some

other visual transient. This has been explained

by

_suggesting that attention is

being

preventedfrom

being

focusedonthe changebecauseofthedistractioncaused

by

thevisual

transient. In other_words, the changeblindnesscouldbe duetoa_masking, or _resetting, of

the internal representation ofthe world

(Rensink, O'Regan,

and

Clark,

1995). It

has

also

recendy been found that small random changes in the scene, such as tiiat

due

to a

mud-splashon acarwindshield,can also resultinchangeblindness

(O'Regan,

Rensink,

and

Clark,

1999). Not _onlyare mentalimages unreliable,buttheinternal representation isquite sparse

and contains _only the informationabout theenvironment that is of central interest. This

(38)

encoding

visual primitives and

binding

them

together,

cognition dictates what is actually

preservedinand retrieved

from

memory. _{It may be}thatit isa more efficient_strategytouse

theworld as an external

memory

_source,and_onlyencodetheinformation

diat

_currendy

lias

meaning.

2.4.3

Exocentric Reference Frames

The notion of "world-as-anchor" can

best

be summed _up

by

_sayingthatweare

perceptually

predisposedto_seeingtheworldaroundusas_stable,despite largechangesineye

and

body

positionthatdisplacetheimageontheretina significandy.

When a small afterimage is viewed in darkness except for the glow of a small,

stationaryreference

light,

andtheeye_moves, the afterimageappearstomove relative tothe

referencelight. Whentheafterimageislarge (complexscene), itappearstoremain_stationary

when the eye _{is moved,} and the referencelight insteadappears to _move, even though the

subject knowsthe reference lightis _{actually stationary (Pelz} andHayhoe, 1995). Whenthe

subjects were instructed to inspect the afterimage andmade large saccades

(up

to

5),

the

largeafterimagestilldidnot appear tomove. Thiswas explained

by

_suggestingthat

whole-sceneafterimages_carrymoreperceptual "weight" than

do

small, isolatedpatches oflightina

darkened room. The largeafterimage creates an external reference

frame,

or _anchor, that

allows

for

visual_stabilityand_constancyofvisualdirection.

2.4.4 Position

Constancy During

Passive Movement

Position_constancyrefersto theperceptionthattheenvironment

does

not appearto

movewhentheeyes,

head,

or

body

moves, eventhough theimageonthe retinais

displaced.

Irvin Rock

(1967)

found that external frames of reference are used to maintain position

(39)

He seated _subjects,

blindfolded,

_in _{a small motorized wagon and started} _the _wagon

moving. He

disguised

theeffect ofthe acceleration of thewagon

by

telling

thesubjects to

expect a small amount of

jostling

oftheequipment whiletheexperimentwas

being

setup.

He then sent thewagon _{rolling along}a

darkened

hallway

and removedtheblindfold. The

onlyobjects visibleto the subject were_small,

luminous

_circles, placed_alongthe walkofthe

hallway

so that

only

one circle was visibleto thesubject at a time. Thesubjectswereasked

toreport what

diey

wereexperiencing. Seventeenofthe20subjects reportedthat

they

were

stationary

andthe circles were _moving past them. He thenrepeated the experiment with

different_subjects, _changing the luminous circles to

luminous

vertical lines. This time the

subjects were able to see all of the

lines,

which filled the visual field. Twelve of the 20

subjects experiencedthe

lines

as _stationaryandthemselvesas moving. Rock explainedthis

by

saying that the lines provided a

frame

of reference for the subjects that enabled the

correctperception. The results

from

this _studyshowed that forsubjectswho are_passively

moving through their environment, position _constancy can be maintained

by having

an

external

frame

of reference.

2.5

Perceiving

the

Direction

of

Heading During

Motion

Position _{constancy is} not the _only issue _relating to theperception of a stableworld in the

presence ofimagemotionontheretina.

Perceiving

one'sdirectionof

heading

while_making

whole

body

movements, headmovements, andeye movements is critical

for

survivalin the

world,andisanatural_ability

for

humans.

2.5.1 Retinalvs. Fjrtra-retinalInformation

In the 1980's and _early 1990's researchers considered the question of

how

people

(40)

In this case the

flow

field,

which results

from

the_changing structure of theambient optic

array

as the observer moves _around, must

be decomposed

into

both

a translational and a

rotational component. Itwas assumedthat therotationalcomponent

due

toaneye or

head

movement was

effectively

canceled out prior to the

determination

of

heading.

Several

hypotheses have been

proposedtoexplainhowtherotational component could

be

canceled

out.

Theretinal image

theory

claims that there isenough information in theretinalimage

aloneto_accuratelypredictdirection inthepresence ofheador eyemovements(Warrenand

Hannon,

1988). The extra-retinal

theory

claims that proprioceptive

information,

and

possibly an efference _copy of the eye _command, is _necessary to make an accurate

determinationofdirection

(Royden, Banks,

_{andCrowell, 1992).}

Both theories

base

their claims on the results of an experimental _setup that requires

testsubjectstoview a random-dot

display

ofsimulatedmotion onavideoscreen. Thereare

twoparts to theexperiment. Forthe

first

_part, subjects

initially

fixate

a central

target,

then

pursue the target as it moves

laterally

across the screen. For thesecond _part, thesubjects

again fixate a central _target, and continue to

fixate

the target as the

display

changes to

simulate a lateraleye movement. _{The resulting flow}on the retina shouldbe the same

for

bothcases. Flow field

onscreen

Flow fold onretina

RealEyeMovement

-SV"a' '' ' '*

o

* b

""****

'

v * _y

/ V \ Simulated Eye Movement

'"1

t

\

Figure2-5. E^qierimentalcoiMlitkMiSjfor_sufficiencyof retinalinfannaiion. Ina) thesubject was

instructedtofixatethecrossthenmake an eye movementinttedirectionof me arrow. Inc)the

(41)

Themajor

difference between

the two studies was that the retinal image proponents

used slow speeds

for

the real and the simulated eye movements (0.2 -1.2

per _smnd),

whereastheextra-retinal proponentsused

faster

speeds(1 to 5 per second). At theendof

each 1250 msec

trial,

the subjects were asked to state their perceived

direction

ofheading.

Warren & Harmon instructedthesubjectstoindicatetheirperceiveddirectionof

heading by

having

them state whether

they

felt

as if

they

were

headed

to the rightor to the left of a

verticaltarget

line

placed onthe

horizon

ofthelast

frame

ofthe

display

afterthemotionhad

stopped. Royden et al. hadthe subjects state which one of the seven _equally spaced(4

apart) targetswas closest to theperceived

direction

of

heading

afterthemotionhadstopped.

The retinal imageproponents

found

no

difference

inperceiveddirection for thereal

or simulated eye movement. This suggests that all of the information that is required to

perceivedirection ispresent in dieretinalimage.

Interestingly,

theextra-retinalproponents

discovered that there was a significant difference in perceived direction for the real and

simulatedeye movements.

They

found

that the subjects could not tell inwhich direction

they

were headed without _making a real eye movement. When the eye movement was

simulatedonthe

display

screen

(by

_sweepingthedotpattern

laterally

acrossthe_screen),

they

felt as if

they

were _moving _along a curvilinear _path, rather than straight ahead. This is

evidencethat extra-retinalinformationis necessary

for

determining

heading. It appears that

thespeed oftheeye movements might bea reason

for

the

discrepancy

between

theresults.

(42)

from

the retinal

flow

_{pattern without}

any

extra-retinal

input,

whereas

faster

speeds require

theextra

information.

2.5.2

Differential

Motion Parallax

In 1992 James

Cutting

disputed

the

hypothesis

that_movingobservers

decompose

retinal

flow

intotranslational and rotational components. Hemaintainedthatretinalflow in

its entirety issufficient

for this,

in the

form

of

differential

motion parallax. He argued that

the earlier studies did not include the components of bounce and _sway that people

experience when

they

move at a pedestrian speed. He reasonedthatifthesecomponents are

included in the experimental _conditions, subjects would find it much more difficult to

determine their

heading

direction because the additional

decomposition

due to these

components would complicate the process of perception. He found that subjects were

equallyabletodetermine their

heading

direction

withorwithouttheaddedcomponentsof

bounceand_sway,andconcludedthatindividualsuse retinalinfonnation

direcdy

intheform

ofdifferentialmotion parallax.

Neither retinal decomposition nor differential motion parallax considers the

possibilitythat subjects_{may be using}theenvironment as an externalframeof_{reference, in}

much the same sense as was shown for position _constancy and exocentric

frames

of

referenceforafterimages.

2.6

The

Effects

of

Freeing

the

Head

The studies conducted on

heading

perception discussed in the previous section were

conducted in the confines of a

laboratory

setting. The subjects'

eye movements were

monitored with a head-mountedlimbus eye-tracker as

they

watched a simulated

display

of

(43)

Since the

human

visual system

did

not evolve motion perception capabilities inthis typeof

setting, it seems reasonable toconclude that the results_{may differ}when natural movement

throughthereal worldisconsidered.

Traditionally,

most studies of oculomotor

behavior

have relied upon eye-movement

recording devices

thatrequiredthe

head

tobe immobilized

during

theexperiment.

Usually

a

chin-rest or

bite board

was used. The reason

for

_using a

head-restraining

mechanism is

because

inorder

for

an accurate measurement of maintainedfixationtobemade, the devte

mustbeableto

distinguish

betweenmotionoftheeyeinthe

head,

andmotionofthe tracker

with respectto the head. Ifthe tracker moveswhilethe subject_{is mamtaining}

fixation,

an

eye movementwillappear to

have

beenmade,whenin_reality the eye_maynot havemoved

at all. It _{is necessary}to

keep

theheadsecuredtoeUrninate_anymotion ofthe trackerrelative

to theheadinordertodeterminewhentheeyeisrotating.

Early

eyemovementmonitors suchasthecontact-lensoptical

lever,

themagnetic

field

sensor_coil, andthe SRI Dual Purkinje

Image

Tracker requiredimmobilization ofthehead

(Kowler,

1995). Researchers _generally assumed that fixations made with the head

immobilized wouldbe the same in terms of_stability as

fixations

made when the headwas

freetomovebut didnotmove. It has subsequendy been discoveredthatthisassumption is

incorrect

(Skavenski,

et al, 1979). When subjects maintained fixation on a distant

target,

retinal imagestabilization decreasedwhenthehead wasnot _supported, as shown belowin

Figure2-6

for

twosubjects.

When thehead is free to move but

does

not, image motion on the retina canbe as

much as 2or

3

degreespersecond. Visualperceptionis insensitiveto this typeof_motion,

and it has beensuggested that visioncan _{actually be} impairedwhen thehead is not

free

to

(44)

image_motion,tomakethe taskofperception

less taxing,

muchin the

sameway

thatsaccade

targetposition

during

reading

can

be

_{very imprecise}within a_word,yet comprehension

does

not suffer.

Developers

of robust robotic vision might well considerthewide tolerances of

human

visionto

be

amodel

for

systemsthat require thesynthesis of

large

amounts ofdata

for

the performanceof complextasks.

Subject A SubjectB

BITE-BOARD BITE-BOARO

>w-j

SITTING

I

%.'

Figure 2-6. Theeffectof

freeing

theheadonfixationstability. Theverticallinesrepresent1second

intervals. Theverti