Moving through language: a behavioural and linguistic analysis of spatial mental model construction

(1)

Parente, Fabio (2016) Moving through language: a

behavioural and linguistic analysis of spatial mental

model construction. PhD thesis, University of

Nottingham.

Access from the University of Nottingham repository:

http://eprints.nottingham.ac.uk/37620/1/Fabio%20Parente%20-%20PhD%20thesis.pdf

Copyright and reuse:

The Nottingham ePrints service makes this work by researchers of the University of Nottingham available open access under the following conditions.

This article is made available under the University of Nottingham End User licence and may be reused according to the conditions of the licence. For more details see:

http://eprints.nottingham.ac.uk/end_user_agreement.pdf

(2)

MOVING THROUGH LANGUAGE:

A BEHAVIOURAL AND LINGUISTIC ANALYSIS

OF SPATIAL MENTAL MODEL CONSTRUCTION

Fabio Parente, BA, MRes

Thesis submitted to the University of Nottingham for the degree of Doctor of Philosophy

December 2016

(3)

(4)

i

ABSTRACT

Over the past few decades, our understanding of the cognitive processes underpinning our navigational abilities has expanded considerably. Models have been constructed that attempt to explain various key aspects of our wayfinding abilities, from the selection of salient features in environments to the processes involved in updating our position with respect to those features during movement. However, there remain several key open questions. Much of the research in spatial cognition has investigated visuospatial performance on the basis of sensory input (predominantly vision, but also sound, hapsis, and kinaesthesia), and while language production has been the subject of extensive research in psycholinguistics and cognitive linguistics, many aspects of language encoding remain unexplored.

The research presented in this thesis aimed to explore outstanding issues in spatial language processing, tying together conceptual ends from different fields that have the potential to greatly inform each other, but focused specifically on how landmark information and spatial reference frames are encoded in mental representations characterised by different spatial reference frames. The first five experiments introduce a paradigm in which subjects encode skeletal route descriptions containing egocentric (“left/right”) or allocentric (cardinal) relational terms, while they also intentionally maintain an imagined egocentric or allocentric viewpoint. By testing participants’ spatial knowledge either in an allocentric (Experiments 1-3) or in an egocentric task (Experiments 4 and 5) this research exploits the facilitation produced by encoding-test congruence to clarify the contribution of mental imagery during spatial language processing and spatial tasks. Additionally, Experiments 1-3 adopted an eye-tracking methodology to study the allocation of attention to landmarks in descriptions and sketch maps as a function of linguistic reference frame and imagined perspective, while also recording subjective self-reports of participants’ phenomenal experiences. Key findings include evidence that egocentric and allocentric relational terms may not map directly onto egocentric and allocentric imagined perspectives, calling into question a common assumptions of psycholinguistic studies of spatial language. A novel way to establish experimental control over mental representations is presented, together with evidence that specific eye gaze patterns on landmark words or landmark regions of maps can be diagnostic of different imagined spatial perspectives.

Experiments 4 and 5 adopted the same key manipulations to the study of spatial updating and bearing estimation following encoding of short, aurally-presented route descriptions. By employing two different response modes in this triangle completion task, Experiments 4 and 5 attempted to address key issues of experimental control that may have caused the conflicting results found in the literature on spatial updating during mental navigation and visuospatial imagery. The

(5)

ii

impact of encoding manipulations and of differences in response modality on embodiment and task performance were explored.

Experiments 6-8 subsequently attempted to determine the developmental trajectory for the ability to discriminate between navigationally salient and non-salient landmarks, and to translate spatial relations between different reference frames. In these developmental studies, children and young adolescents were presented with videos portraying journeys through virtual environments from an egocentric perspective, and tested their ability to translate the resulting representations in order to perform allocentric spatial tasks. No clear facilitation effect of decision-point landmarks was observed or any strong indication that salient navigational features are more strongly represented in memory within the age range we tested (four to 11 years of age). Possible reasons for this are discussed in light of the relevant literature and methodological differences.

Globally, the results presented indicate a functional role of imagery during language processing, pointing to the importance of introspection and accurate task analyses when interpreting behavioural results. Additionally, the study of implicit measures of attention such as eye tracking measures has the potential to improve our understanding mental representations, and of how they mediate between perception, action, and language. Lastly, these results also suggest that synergy between seemingly distinct research areas may be key in better characterising the nature of mental imagery in its different forms, and that the phenomenology of imagery content will be an essential part of this and future research.

(6)

iii

ACKNOWLEDGMENTS

I am grateful to my supervisors, Alastair D. Smith and Ruth Filik, for their support and guidance over the past few years. They have kept me grounded during my frequent flights of fancy, challenged me, and helped me navigate (some degree of pun intended) a very complex research area. I would also like to extend my gratitude to the University of Nottingham and those of its students, both undergraduate and postgraduate, who have agreed to take part in my experiments over the years. Additionally, I would like to thank the parents and the children who have attended Summer Scientist Week, and who have contributed to making it an exceptional opportunity both for data collection and for science outreach within the local community. A special thank you to Alastair for tirelessly coordinating the organisation of the event.

On a more personal note, I want to thank the many people whose presence in my life has made all the difference both during and before my doctorate. Alexandra, Beerelim, and Lawrence, and all the other PhD students and postdocs (Dominic and Prasannah deserve a special mention and a pint) – for being there and sharing in all the good, bad, and frustrating times, and for the countless stimulating conversations.

The biggest thank you, and all my love, goes to my family for their unwavering support and patience during the most challenging, exciting, beautiful, and exhausting four years of my life. Your ability to believe in me even when I cannot is my greatest source of strength, and I hope I can make you as proud as you make me.

Last, but not least, Ms Binotti and Ms Neri – pillars of my high-school experience – deserve recognition and the accolade of Most Patient Teachers in the History of Teachers. I drove you nuts, yet you never failed to push me to do better. If I am taking these first steps in hope of becoming a capable researcher, it is only because you were able to keep my love for science alive even when school tried to kill it. Whatever mess I make in academia from now on is on you.

(7)

iv Author’s declaration

I hereby declare that the work contained within this thesis was carried out in accordance with the Regulations of the University of Nottingham. The work is original except where indicated by special reference in the text, and no part of this work has been submitted for examination as part of any other degree, in the United Kingdom or overseas. All views expressed in this thesis are those of the author and in no way represent those of the University of Nottingham.

(8)

v

TABLE OF FIGURES

Figure 1.1 . ... 4 Figure 1.2 ... 14 Figure 1.3 ... 27 Figure 1.4 ... 36 Figure 1.5. ... 37 Figure 1.6 ... 44 Figure 1.7 ... 47 Figure 1.8 ... 48 Figure 2.1 ... 55 Figure 2.2. ... 56 Figure 2.3. ... 58 Figure 2.4 ... 60 Figure 2.5. ... 77 Figure 2.6 ... 83 Figure 2.7. ... 84 Figure 3.1 ... 98 Figure 3.2 ... 99 Figure 3.3 ... 100 Figure 3.4 ... 102 Figure 3.5. ... 102 Figure 3.6 ... 103 Figure 3.7 ... 104 Figure 3.8 ... 108 Figure 3.9 ... 108

Figure 3.10 and Figure 3.11. ... 109

Figure 3.12 and Figure 3.13 ... 110

Figure 3.14. ... 111

Figure 3.15 ... 113

Figure 3.16 ... 113

(12)

ix Figure 3.18 ... 122 Figure 3.19 ... 123 Figure 3.20 ... 125 Figure 3.21. ... 125 Figure 3.22 ... 126 Figure 3.23 ... 127 Figure 3.24. ... 128 Figure 3.25 ... 128 Figure 3.26. ... 131 Figure 3.27 ... 131 Figure 3.28 ... 132 Figure 3.29 ... 132 Figure 3.30. ... 137 Figure 3.31 ... 138 Figure 3.32 ... 141 Figure 3.33 ... 141 Figure 3.34 ... 143 Figure 3.35 ... 143 Figure 3.36 ... 152 Figure 3.37 ... 153 Figure 3.38 ... 154 Figure 3.39 ... 157 Figure 3.40 ... 158 Figure 3.41 ... 158 Figure 3.42 ... 159 Figure 3.43 ... 161 Figure 4.1. ... 174 Figure 4.2 ... 178 Figure 4.3 ... 179 Figure 4.4 ... 181 Figure 4.5 ... 182 Figure 4.6 ... 182 Figure 4.7 ... 184 Figure 4.8 ... 185 Figure 4.9 ... 186 Figure 4.10 ... 187 Figure 4.11 ... 188 Figure 4.12. ... 191 Figure 4.13 ... 193 Figure 4.14 ... 193 Figure 4.15 ... 194 Figure 4.16 ... 197 Figure 4.17 ... 197 Figure 5.1 ... 213 Figure 5.2. ... 215 Figure 5.3 ... 219 Figure 5.4 ... 219 Figure 5.5 ... 220

(13)

x Figure 5.6. ... 225 Figure 5.7. ... 226 Figure 5.8 ... 226 Figure 5.9 ... 228 Figure 5.10 ... 229 Figure 6.1. ... 242 Figure 6.2 ... 243 Figure 6.3 ... 244 Figure 6.4. ... 245 Figure 6.5 ... 246 Figure 6.6 ... 247 Figure 6.7 ... 248 Figure 6.8 ... 250

TABLE OF TABLES

Table 2.1. ... 64 Table 2.2. ... 68 Table 3.1 ... 96 Table 3.2 ... 100 Table 3.3 ... 104 Table 3.4 ... 105 Table 3.5 ... 105 Table 3.6 ... 120 Table 3.7 ... 133 Table 3.8 ... 133 Table 3.9 ... 134 Table 3.10 ... 134 Table 3.11 ... 152 Table 3.12 ... 153 Table 3.13 ... 155 Table 3.14 ... 159 Table 4.1. ... 178 Table 4.2 ... 183 Table 4.3 ... 194 Table 5.1 ... 215 Table 5.2. ... 220 Table 5.3 ... 227 Table 5.4 ... 228 Table 5.5 ... 228

(14)

xi

I dedicate this thesis to my grandmother, Pina. You

unknowingly led me here even as you were losing yourself, and

gave me the strength and motivation to pursue a dream I had

cast aside. I will be forever grateful.

(15)

(16)

1

CHAPTER 1

Navigation and Mental Imagery

1.1.

Overview

Humans are inherently spatial creatures, and our survival is ultimately conditional on our ability to interact meaningfully and efficiently with our surroundings. Whether we are reaching for a glass of water located on the desk next to our laptop, walking across the room to reach a pile of papers in a bookcase, or walking into town to run an errand, this crucial ability involves a complex interplay of different cognitive mechanisms. These include the identification of salient features in the environment (e.g. a bright red post box), and the construction of mental representations of environmental space that can merge object identity with location and distance information (e.g. a bright red post box located at a T junction, a few hundred metres down the road from our house). Finally, active navigation requires the planning, execution, and online monitoring of motor behaviour (e.g. which way do we turn to get to the T junction, and which way do we turn there with respect to the post box to reach our destination?).

However, much of our usual navigational behaviour can often involve heading towards more distal goals that lie beyond our immediate perceptual field. In such situations, the spatial representations that we require to plan our motor behaviour must be informed by our long-term knowledge of the environment in which we are operating. Planning a route between two buildings located on opposite sides of one’s university campus, for example, requires an understanding of the relative spatial positions of the two locations and of the potentially salient navigational landmarks that might be located between them, as well as knowledge of the network of roads connecting them. There are, however, situations in which even our long-term memory cannot be relied upon to guide navigation. The exploration of a novel environment, such as a town we are not familiar with, might require us to operate on the basis of information provided to us via linguistic propositions. Whether we are asking a stranger for directions or reading a series of route directions on the Web, we will need to extract navigational information from the linguistic content and generate on that basis an appropriate mental model of an environment we cannot directly perceive. Depending upon the type and richness of the information provided, the resulting mental model may display more or less detailed visuospatial properties and may be perceived as phenomenologically analogous to the active exploration of the real environment. The research presented in this thesis aims to explore, at least in part, the nature of and interactions between the various cognitive processes involved in the construction of said spatial mental models and visual mental images from linguistic input when allothetic (i.e. optic flow) and idiothetic (i.e. proprioception) cues generated by active motion during

(17)

2

navigation are not available. This body of work is principally concerned with exploring the way in which linguistic manipulations and imagery manipulations interact with each other both during the encoding of spatial linguistic content and during subsequent performance of spatial tasks.

Chapters 1 and 2 will lay the theoretical foundations for the work presented in this thesis, discussing and introducing a number of notions necessary to ground and interpret the following experiments and central to an understanding of spatial cognition. These are: spatial reference frames (Section 1.2), landmarks, and landmark salience (Sections 1.3). Subsequently, I will discuss how these components might be integrated within mental representations that can support navigation (Sections 1.4-1-10), and what factors (individual differences and environmental factors) might drive the selection of certain representations over others (Section 1.11). In Chapter 2, this body of work will be framed within the context of the processes that might underlie the transfer of information between external representations (e.g. maps, or linguistic descriptions of space) and internal representations thereof (Section 2.1). Accordingly, research will be reviewed that has explored the interaction between the human language faculty and navigational abilities (Section 2.2), providing a theoretical motivation for studying this interaction. This will then be followed by a discussion of the factors (cognitive and linguistic) that can influence encoding processes during language processing during the construction of mental representations (Section 2.3). Similarly, the factors influencing the production of external representations will be discussed, particularly with respect to the role played by representational congruency between encoding and test (Section 2.5). Last but far from least, eye movements will also be discussed as potential windows into the construction of mental representations of space during the processing of spatial language, and on mental imagery in general (Section 2.6). The information presented in this section will be paramount for a complete understanding of Experiments 1-3, presented in Chapter 3, and will additionally introduce elements that are central to the broader theoretical framework of this thesis, such as the susceptibility of eye movements to top-down effects during reading and scene processing.

1.2.

Spatial Reference Frames

A fundamental requirement of successful navigation is the ability to encode the position of objects and environmental features within cognitive structures that can both support immediate navigation and the formation of enduring spatial representations in long-term memory. This ability relies on the use of spatial coordinate systems onto which spatial locations can be anchored.

Traditionally, the spatial cognition literature has distinguished between egocentric (or body-centred) and allocentric (or geocentric) reference frames. This distinction has hinged largely on three aspects: the type of input required to

(18)

3

generate them, the type of cognitive processes and spatial tasks they support, and the developmental and cognitive hierarchy in which they are structured. Early developmental models (Piaget & Inhelder, 1967) postulated that the ontogenesis of spatial abilities in children followed a set of sequential milestones, with early reliance on egocentric representations and a qualitative shift towards more complex allocentric representations upon onset of independent locomotion. This model was later expanded into a more general model of spatial microgenesis (Siegel & White, 1975), which assumed a stepwise acquisition of three categories of environmental knowledge. Landmark knowledge concerns the identity of salient and stable environmental features, or discrete object, and is based on egocentric reference frames. Route knowledge involves an egocentric understanding of the paths connecting the various landmarks, and of the sensorimotor sequences that allow navigation between them. It is initially non-metric and improves with repeated exposures to the environment (Ishikawa & Montello, 2006; Montello, 1998). Survey knowledge is a map-like, allocentric representation of the global environment that can support the plotting of alternative routes and shortcuts.

An egocentric frame of reference codes spatial relations on a coordinate system centred on the organism itself. This type of reference frame is thought to be the one most readily constructed on the basis of sensory input during active navigation. Due to this, the encoding (and, consequently, the recall) of visuospatial information within an egocentric frame of reference is also orientation-specific and viewpoint-dependent. They are considered less flexible and primarily used to support perception-driven navigation in near or peripersonal space. As we walk towards and reach for an object, for example, we must construct a motor program that will first direct our legs to move in its general direction, and then our arm and hand towards it.

However, not all navigational (or, more generally, visuospatial) behaviour relies on the processing of spatial relations within a body-centred frame of reference. Updating self-object spatial relations during movement on the basis of idiothetic input (a process known as egocentric path integration) is thought to be subject to cumulative error over increasing distances (Burgess, 2008). During instances of navigation in larger environments and over longer distances, allocentric representations are usually preferred. An allocentric reference frame encodes the position of objects in an environment not with respect to the navigator’s body, but with reference to each other or to other stable environmental features on a set of coordinates centred on the global environment itself. This type of spatial relation coding is fundamental, for example, in the process of maintaining a stable heading while moving towards a more distal location in extrapersonal space, and it is central to many models of visuospatial long-term memory (Burgess, 2006; 2008).

Over time, evidence has emerged to challenge the assumption of a stepwise acquisition and hierarchical organisation of reference frames, both in microgenetic

(19)

4

and ontogenetic terms (discussed in more detail in Chapter 5, in which three developmental studies are presented). First of all, the egocentric perceptual experience we perceive as unitary is already the result of the synthesis of sensory input originating in different intrinsic reference frames (Galati, Pelle, Berthoz & Committeri, 2010). In order to code and update the relative position of our body and limbs during motion with respect to the reference object we must rely on predominantly egocentric sensory experiences, such as visual, somatosensory, vestibular, and auditory input. At the lowest level, these sensory inputs are acquired within slightly different body-centred coordinate sets. Optic flow input is first used to plot the necessary spatial relations within retinotopic coordinates (i.e. the object in question might appear in the lower-right quadrant of our visual field) (Török, Nguyen, Kolozsvári, Buchanan & Nadasdy, 2014). Auditory and vestibular input is acquired in head-centred coordinates, and proprioceptive information in body-centred coordinates. In a series of processing stages, these various inputs must be integrated by shifting and merging the receptive fields of different neuronal populations into a single coherent reference frame that can support navigation (Avillac, Denève, Olivier, Pouget & Duhamel, 2005; Fogassi & Lupino, 2005) and that we perceive as egocentric.

(20)

5

Additionally, the idea that our spatial understanding of a novel environment is initially purely egocentrically constrained has also failed to stand up to further scrutiny. Montello (1998) proposed an alternative theoretical framework according to which spatial knowledge acquisition follows a continuous trend. Rather than progressing through qualitatively different stages, knowledge of distances between environmental locations should be above chance already after early exposures and increase continuously as a function of experience with the environment. However, the extent and accuracy of said spatial knowledge (as well as the rate of improvement over time) will be a function of individual differences. Additionally, this continuous framework posited that integrating spatial knowledge of separate environments acquired during distinct navigational events into a single allocentric knowledge structure represents the only real qualitative step during spatial microgenesis.

Ishikawa and Montello (2006) tested this framework by exposing a sample of university students to two routes in unfamiliar neighbourhoods (Figure 1.1) over 10 weekly sessions. Participants were driven along the routes, and along a shorter path connecting them. During the first three sessions, they wore blindfolds while travelling circuitously between the two test routes. Starting from the fourth session, they were driven along a direct connecting route without blindfolds, in order to allow them to integrate their knowledge of the two routes into a single mental representation. After each session, participants carried out direction and straight-line distance estimation tasks between pairs of landmarks, and after every other session they drew sketch maps of the routes, including their shapes, the spatial relation between them, and the four landmarks encountered on each one. Following exposure to the connecting route, participants were probed on both within- and between-route direction and distance estimates. Additionally, participants took the Santa Barbara Sense of Direction (SBSOD) self-report scale (Hegarty, Richardson, Montello, Lovelace & Subbiah, 2002). Results showed that already after a first exposure, participants were able to acquire landmark, route, and survey knowledge that included above-chance awareness of metric knowledge (understood as quantitative but approximate knowledge of distances between locations), confirming one of the predictions of Montello’s (1998) framework.

Very little group-level improvement was observed between the first and the

10th_{session for within-route tasks, whereas participants’ understanding of the}

connection between the two test routes (as evidenced by the maps drawn after the fourth session) showed more evidence of improvement (and was reported by participants to be more challenging than other tasks). Analyses of individual participants’ data, however, showed evidence of considerable between-subject variability, also consistent with predictions. Good and poor performers were found to be such consistently already from the first session, and approximately half of the participants showed slight evidence of improvement over time. Interestingly,

(21)

6

participants’ SBSOD scores were found to positively correlate with their performance in direction and distance estimates, and in the map drawing tasks. However, this was only the case following exposure to the more complex U-route (see Figure 1.1), which involved multiple changes in heading, and for between-route direction estimates which required the construction of a more complex survey representation of the environment containing both routes. This confirmed multiple predictions of the alternative framework: that route complexity and individual differences would modulate performance, and that integrating spatial representations of distinct routes into a single one would represent a qualitative step in the acquisition of spatial knowledge.

In a more recent study, Ishikawa (2013) presented participants with a video of an urban route containing five turns and five landmarks (counterbalanced between turn and non-turn locations). After watching the video, participants were then tested on four spatial tasks. In a landmark memory task, participants had to list the names of the five landmarks encountered, in order of appearance. In a route-choice task, participants were shown five egocentric snapshots of intersections and asked to whether they remember turning at that location during encoding, and in what direction. In the direction estimation task, participants estimated the spatial relationships between the five landmarks encountered (for a total of 10 pairs). In the map-sketching task, participants were asked to draw as accurate a map of the learned route as possible. Additionally, half of participants repeated the spatial tasks after 2 weeks, and half after 3 months from exposure, in both cases without watching the video a second time.

Results showed differential patterns of memory decay for landmark, route, and survey knowledge as a function of sense of direction (as measured by participants’ SBSOD scores). More specifically, the two groups displayed comparable rates of rapid decay of landmark name recall and topological route knowledge, but individuals with a better self-reported sense of direction showed a significantly lower rate of decay of survey knowledge. It therefore appears that the use of allocentric representations of space is per se no more effortful than the construction and processing of egocentric representations. During active navigation, the sensory input acquired via different modalities can be merged and form the basis of viewpoint-dependent egocentric snapshots of events and locations. These are action-oriented representations of self-object spatial relations (Burgess, 2006), and the automatic use of visual, vestibular, and kinaesthetic input allows them to support spatial updating over short distances (Riecke, Cunningham & Bülthoff, 2007). Allocentric representations can also be generated based on sensory input after relatively short exposures to novel environments, and a tendency to favour either spatial reference frame is the result of a complex interplay of disparate factors. These include environmental features, the degree of motion involved, task demands,

(22)

7

neurodevelopmental characteristics, sociogeographical differences, age, and others (see Section 1.11).

Egocentric and allocentric representations, however, are not only constructed in parallel but are also inherently interactive. This means that the way in which humans initially experience an environment can also influence the resulting long-term representations of that space. McNamara, Rump and Werner (2003) had participants learn the locations of eight objects located at the intersections of two paths encircling a large, rectangular building, and in the vicinity of a salient environmental landmark (a lake). One of the paths was aligned to the walls of the building, while the other was out of alignment by 45 degrees. Subjects subsequently had to inspect their mental representations of the environment and point to the target objects from imagined vantage points and headings. Pointing accuracy was greater after experiencing the environment from the aligned path compared to the misaligned one, indicating the fundamentally allocentric nature of participants’ representations. However, imagined headings aligned with the salient landmark also led to increased pointing accuracy, and this was taken as indication that the geocentric features used to construct intrinsic reference frames are selected on the basis of egocentric experience. Additionally, the results provided evidence of orientation-dependent alignment effects in otherwise allocentric spatial memories.

This deeply interactive system of parallel reference frames raises several important questions that are relevant to the current research. Namely, what processes mediate the construction of spatial representations based on linguistic input and how do egocentric and allocentric reference frames interact within this domain? Are both egocentric and allocentric representations constructed in parallel based on linguistic input? And, if that is the case, can experimental paradigms be developed that will allow to determine, on the basis of dependent measures of linguistic encoding and visuospatial performance, the type of reference frame adopted in the construction of the underlying spatial representations? However, before addressing these questions, other fundamental notions must be discussed in more detail. Among them is the idea of landmark, which will be covered in the next section.

1.3.

Landmarks and Spatial Learning

Previous research (Newcombe & Huttenlocher, 2003; Newcombe, Huttenlocher, Drummey & Wiley, 1998) has categorised these spatial coding systems we use to encode the locations and relative positions of entities in environments on the basis of the reference frame upon which they rely, of the type of spatial relations they encode, and of the behavioural complexity they can support. More specifically, spatial coding systems can be classified depending on whether they code spatial relations with respect to the self (and within an egocentric frame of reference) or with respect to external landmarks (and within an allocentric frame of reference).

(23)

8

The systems known as response learning and dead reckoning fall within the first group and require a constant awareness of one’s own position in space. The former involves the re-enactment of motor sequences whose accuracy in reaching a target depends on a constant starting point (e.g. reaching for the right-hand drawer when seated on one particular side of the desk), whereas the latter is a more complex system involving the integration of optic flow, vestibular, and kinaesthetic information in order to update one’s position.

On the other hand, the location of a target object within an environment is often encoded with respect to other stable features (i.e. landmarks) within the environment itself. Given the importance of landmarks in guiding many instances of navigational behaviour, it is important also to construct a taxonomy of functions that they can assume. In this sense, a distinction can be made between landmarks used as associative cues for navigational actions and those used as beacons. Cue learning of spatial locations involves the direct association of a target object or location with a coincident landmark, provided that the association is habitual or otherwise stable over time. For example, one might keep wine glasses in the cupboard right above the sink. The association can also involve a landmark region rather than a landmark object. In this sense, both wine glasses and the sink are associated with a region of space located in one’s kitchen. However, in certain situations, no distinctive, coincident landmark may be available that can serve as an associative cue, such as when we are attempting to locate our car in a full parking lot. In such cases, place learning requires that the target object or location be encoded in terms of its distance and relative direction from more distal landmarks. These landmarks, or beacons, are defined as highly visible navigational objects that indicate or are target locations (Chan, Baumann, Bellgrove & Mattingley, 2012), providing highly accurate positional information even from a long distance and from all locations in the environment. A skyscraper or a church’s spire would be examples of target landmarks within an urban environment that might act as beacons. In a study aimed at testing the relative advantages and disadvantages of beacon and associative cue navigation, Waller and Lippa (2007) had participants explore a virtual environment composed of 20 rooms in a linear sequence, each of which contained two doors. Only one would allow the participant to progress to the following room, and doors could either be marked by a single landmark placed between them (Associative Cue) or by two landmarks, each placed next to one door (Beacons). Additionally, a “No Landmark” condition was included to test for the facilitating effect of landmark presence. Over the course of several trials, participants navigated through the same environment, allowing the experimenter to record both the number of correct doors selected overall and the increase in accuracy over subsequent trials.

Results revealed that the presence of landmarks led to better performance compared to the No Landmark condition. However, the facilitating effect of landmark presence was modulated by the function of the landmarks, leading to

(24)

9

greater increases in accuracy earlier in the experiment when they acted as beacons compared to when they acted as associative cues. That is, accuracy increased more quickly when participants could simply encode the identity of the landmarks to aim for in the various rooms. This, however, also translated into a poorer recall of directional information when landmarks were removed in the last trial, indicating that the need to only encode landmark identity during beacon navigation may lead to weaker consolidation of directional information.

However, a perhaps more fundamental issue than the function of landmarks in navigation is the nature of what constitutes a landmark in the first place. In spite of the central role of landmarks in guiding spatial navigation, no univocal definition of the term has been presented in the literature. This is perhaps an indication of the considerable flexibility with which landmark selection occurs. Stable environmental features are normally selected as navigational aids if they present a higher degree of salience compared to other environmental features. Although the determination of this salience is far from being a simple cognitive task, attempts have been made to determine both its neural and psychological underpinnings.

A number of studies have closed in on the neural circuitry that appears to be involved in responding to navigationally salient features of environments, while also providing behavioural correlates for landmark salience discrimination. Janzen and van Turennout (2004) studied the role of the parahippocampal gyrus (PHG) in encoding landmark objects during navigation. In an fMRI study they presented adult participants with videos of a route through a virtual environment and instructed them to remember both the route and the objects they encountered. These objects could be either toys or objects belonging to other semantic categories, and they could be located either at intersections (decision point objects) or at simple turns (non-decision point objects). Participants were further instructed to pay particular attention to the toys, in order to be able to guide a group of children along the tour. Following route learning, participants engaged in an object recognition task, during which they were shown previously encountered and novel objects of both semantic categories and asked to determine via button press whether they had seen the objects or not. During this phase, the objects were presented from a canonical orientation on a white background, to separate the recall of the object identities from that of the spatial information participants may have encoded during learning.

No significant differences were found in response accuracy rates as a function of semantic category or navigational salience. However, toys were responded to significantly more quickly than non-toys, and toys at decision points significantly more quickly than toys at non-decision points. On the other hand, response times did not differ as a function of navigational salience for objects in the non-toy semantic category, indicating that the navigational salience of landmarks may, to an extent, interact with other task-related top-down demands, such as instructions to attend to specific categories of landmarks. In neural terms,

(25)

10

navigational salience and semantic salience were found to be served by distinct neural mechanisms, with stronger activation in the right fusiform gyrus (BA 37) for attended objects (toys) compared to unattended objects (non-toys), and increased activation in the left and right parahippocampal gyri for decision-point objects compared to non-decision point objects. In the right PHG, the increased activation for decision-point objects was also found for forgotten objects (objects that were present in the videos, but that participants had incorrectly judged not to have seen). Globally, the results suggested that the encoding of navigational salience is automatic (present when participants are instructed to attend to objects based on non-navigational criteria), independent of spatial information requirements during retrieval (when objects are presented in isolation), and even of conscious recall of the object landmarks. The study also specifically implicated the PHG in the acquisition of object-place associations during route learning, and indicated that this form of learning requires limited exposure to the environment, allowing fast and dynamic changes to spatial maps during navigation. This was confirmed in a following study by Janzen, Wagensveld and van Turennout (2007), who exposed participants to different route sequences a different number of times. Results revealed that the number of exposures (one vs three) did not modulate the differential parahippocampal activation for decision-point objects compared to non-decision point objects. The representation of landmark salience was already stable after one exposure to the route, meeting an important requirement for a navigational system capable of quickly acquiring navigationally salient information and of maintaining it over time.

However, Janzen, Jansen and van Turennout (2008) observed that time from exposure and the resulting memory consolidation did influence hippocampal and parahippocampal activity, but that this effect was modulated by navigational ability. In that study, participants were presented with two route sequences through a virtual environment containing landmarks at both decision and non-decision points. As in previous studies, participants were instructed to explicitly attend to a specific class of objects, the toys, rather than the other objects, regardless of their spatial location. One route was presented the evening prior to the fMRI scanning session, and the other immediately before it. An object recognition task was performed during scanning as in previous studies, and participants indicated whether they had seen the presented objects in either of the two routes they had experienced. Participants were divided into good and bad navigators based on their score on the Santa Barbara Sense of Direction Scale (SBSOD), a self-report measure of navigational skills already introduced in Section 1.2 as a correlate of survey spatial abilities.

Behaviourally, accuracy rates revealed higher error rates in response to landmark objects encountered the evening before scanning, and lower error rates for landmarks of the attended semantic category (toys). Additionally, toys at decision

(26)

11

points were recalled more accurately than those at non-decision points, but this effect of navigational salience was not present for non-toy objects. Attended objects also elicited faster responses than non-attended ones. An analysis of the fMRI data revealed that objects encountered the night prior to scanning elicited stronger bilateral hippocampal activity. This consolidation effect was positively correlated with participants’ SBSOD scores, with good navigators also displaying stronger responses in the PHG to consolidated decision-point landmarks compared to recently encountered ones.

The results of Janzen et al. (2008) pointed to a role of memory consolidation and individual differences in navigational salience perception, and strengthened the view that the PHG is involved in the enduring representation of navigationally salient landmark information. However, the mechanism via which this salience determination is carried out so that only useful information is stored remained to be elucidated. In a following study (Janzen & Jansen, 2010) this mechanism was more closely studied by confronting participants with ambiguous landmark information (i.e. instances in which potentially salient landmarks appear at two different decision points requiring two different directional turns). Participants actively explored a virtual environment containing objects they were explicitly instructed to attend to (toys) and objects belonging to other semantic categories. Each object appeared twice at two different decision points (D-D objects), at two different non-decision points (ND-ND objects), or at one decision and at one non-decision point (D-ND and ND-D objects, also “one-D objects”), for a total of 288 encounters. Active exploration was followed by an object recognition task (during fMRI scanning) that included both previously encountered and novel toys and non-toys. During this task, each object was presented only once, and participants had to judge whether they had encountered it during exploration of the environment or not. Behaviourally, D-D objects were found to elicit the most errors and ND-ND objects the fastest responses. Once again, attended objects yielded lower error rates and faster responses than unattended objects.

An analysis of the fMRI data showed that one-D objects elicited greater parahippocampal activity compared to ND-ND objects, irrespective of the semantic category of the objects and consistent with previous findings (Janzen et al., 2007; Janzen & van Turennout, 2004). On the other hand, D-D objects elicited greater activity than ND-ND objects in the right middle frontal gyrus, a prefrontal region implicated in cognitive control (Miller & Cohen, 2001), spatial working memory (Courtney, Petit, Maisog, Ungerleider, & Haxby, 1998), in the selection of contextually relevant information (Ridderinkhof, Ullsperger, Crone, & Niewenhuis, 2004), and the detection of expectation violations (Corlett et al., 2004; Fletcher et al., 2001). Additionally, the middle frontal gyrus was found to respond more strongly to D-D objects associated with different directional turns compared to D-D objects associated with turns in the same direction. Globally, these findings suggest that the

(27)

12

determination of navigational salience is a flexible process that is continuously informed by incoming input and that conflicting or misleading information pertaining to navigationally salient regions of a route or environment activates areas involved in executive functions such as cognitive control.

The role of the PHG in the marking of navigationally salient landmarks was further explored by Wegman and Janzen (2011), who studied its resting state connectivity with other brain regions. As in previous studies, participants were shown a video of routes through four sections of a virtual environment containing landmark objects both at decision and non-decision points. Participants were instructed to learn the routes and to pay particular attention to objects of interest to children visiting the environment (i.e. toys). All objects appeared on posters located at decision points and non-decision points, and each section contained the same number of attended and unattended objects located at navigationally salient and non-salient points.

Unlike in previous studies, participants’ eye movements were recorded during the learning phase. These data were used to segment sections of the fMRI recordings that corresponded to object viewing period, defined as the number of consecutive frames participants’ eye gaze was on the object’s coordinates. For each object, the video frame in which the object was no longer visible was taken as the offset of the object viewing trial. However, eye gaze data also provided a measure of attention allocation. They revealed that participants spent longer looking at toys compared to objects belonging to other semantic categories, but also that toys located at non-decision points were fixated for longer than toys at decision points, and toys at non-decision points for longer than non-toys at non-decision points.

In this study, fMRI recordings were made during route learning, and while participants performed a landmark recognition task. During learning, first fixations on decision-point landmarks were found to result in increased activity in the PHG compared to fixations on non-decision point landmarks. Relatedly, periods of looking at screen locations with objects corresponded to periods of increased activity in the PHG compared to looking at regions without objects, and increased PHG activation for an object was predictive of its successful recall during object verification. Additionally, decision points without landmark objects also resulted in higher PHG activation compared to empty non-decision points, indicating that this region is sensitive to the navigational salience of a decision point within a route, irrespective of the concurrent presence of a landmark object.

Furthermore, resting state functional connectivity scans were performed before and after the learning phase. This was intended to investigate how spatial learning alters the connectivity between the PHG and the rest of the brain. More specifically, changes in functional connectivity were investigated between the PHG and regions involved in egocentric and allocentric navigation respectively: the caudate nucleus and the hippocampus (Hartley, Maguire, Spiers, & Burgess, 2003;

(28)

13

Voermans et al., 2004). The functional connectivity analysis revealed changes in connectivity between pre- and post-learning that correlated with participants’ self-reported navigational abilities as measured by the SBSOD. More specifically, SBSOD scores were found to positively correlate with the rate of post-learning connectivity increase between the PHG and the right hippocampus, but negatively with the rate of post-learning connectivity increase between the PHG and the right caudate nucleus. This finding is consistent with the idea that higher self-reported navigational abilities correlate with a preference for allocentric navigational strategies which rely on hippocampal regions. As discussed in Section 1.2., both egocentric and allocentric spatial reference frames can be computed in parallel, but such ability is susceptible to considerable between-subject variability. Accordingly, Wegman and Janzen suggest that an individual’s propensity to employ an allocentric or egocentric navigational strategy might be a function of the degree to which landmark information is transmitted from the PHG to the hippocampus or the right caudate nucleus respectively.

The post-learning resting state scan was followed by an object recognition task akin to those used in previous studies. Recognition performance was found to be higher for toys compared to non-toys, and response times were found to be faster for toys compared to non-toys. An analysis of BOLD responses to D and ND objects during the recognition task revealed higher bilateral PHG and bilateral middle occipital gyrus activation for the former. Additionally, toys resulted in higher activation in the fusiform gyrus bilaterally, right middle temporal gyrus, and right superior occipital gyrus. Non-toys, however, resulted in greater activity in the left fusiform gyrus.

While the studies presented in this thesis do not contain brain-imaging components, the studies by Janzen and colleagues provide a theoretical foundation to explore the processing of the navigational salience of landmarks. Their results constitute evidence of a network of brain regions involved in the extraction of navigational salience information during egocentric route learning and landmark recall. The results indicate that the perception of navigational salience is fast and automatic, and that decision points in a route are perceived as inherently salient by the human navigational system even in the absence of landmarks. Furthermore, certain behavioural and neurophysiological measures of landmark salience (e.g. eye tracking measures of viewing time, or fusiform gyrus activity) were also found to be modulated by factors such as task demands (e.g. the requirement to focus on specific semantic classes of objects).

This might suggest that the determination of landmark salience is, despite its speed, a complex and multifactorial process integrating different types of bottom-up and top-down information, and that the interactions between these different factors must be better understood in order to correctly model landmark salience perception in its various forms. In one such model, Caduff and Timpf (2008) have proposed that

(29)

14

landmark salience can be described as the vector product of three individual vectors representing Perceptual, Cognitive, and Contextual Salience. Perceptual Salience (PS) models the bottom-up allocation of attentional resources to features detected in the stream of sensory input. In the visual modality, Caduff and Timpf identify Location- and Object-based Attention (LA and OA), and Scene Context (SC) as the fundamental units of attention. LA involves the processing of visual stimuli from the entire visual field and their decomposition into feature maps that extract colour, intensity, and texture orientation information based on discontinuity, and their subsequent recombination into global saliency maps (Itti, Koch & Niebur, 1998) (Figure 1.2). OA can single out individual objects in a scene based on their structure and geometric features, such as size, shape, and orientation. SC operates at the global scene level, and integrates the other two components of perceptual salience with relevant contextual information. This component can allow the differential salience weighting and disambiguation of otherwise perceptually identical objects owing to their

different spatial locations and spatial relations within the scene.

Figure 1.2 - Flow diagram of Itti and Koch's (2001) bottom-up attention model.

Cognitive Salience describes the top-down allocation of attention as a function of the viewer’s prior knowledge and experience, and it relies on the construction of mental representations of spatial environments. The availability for extraction of individual objects or environmental features from these

(30)

15

representations is taken to be a function of their Degree of Recognition (DR) and Idiosyncratic Relevance (IR). DR occurs as the degree of matching between a viewpoint-dependent observation of an object and a mental representation of that object created as a result of prior experiences. IR, on the other hand, is a measure of individual familiarity one might have with an object as a result of the object’s personal, cultural, or historical significance to the observer. As such, IR increases with the number of exposures to the object and of activities related to it. For example, one’s own previous place of employment or education may have particularly high Idiosyncratic Relevance, where it otherwise might have very little Perceptual or Cognitive Salience to anyone else.

Contextual Salience is a measure of the degree of attention that can be allocated to potential landmarks as a function of the type of task being carried out (Task-based Context, or TC), as well as of the mode of transportation being used and amount of resources to be allocated (Modality-based Context, or MC). During the processing of route instructions, for example, TC is defined in terms of binary relations between potential landmarks and the path selection prompted by each instruction. A saliency value is therefore assigned to each pairing of path and potential landmark within the field of view, with distance and orientation between landmark and path acting as key discriminating factors. In this model, a landmark located more proximally to a turn location will be more salient to a navigator standing within view of that decision point than a more distal landmark. Relatedly, the modality being used to navigate the environment will significantly influence the navigator’s field of view and attentional allocation, so that active navigation (e.g. driving a car) will require more attentional resources than a more passive form of navigation (e.g. riding a bus). Similarly, the speed of motion (e.g. walking vs driving a motor vehicle) will contribute to the determination of a navigator’s field of view.

Additionally, Caduff and Timpf’s (2008) theoretical framework models the online sequence of events involved in determining landmark salience during navigation. In a first stage, sensory stimuli are stored in a Sensory Memory. Here, those stimuli undergo parallel Pre-Attentive processing whereby low-level visual properties of the stimuli are identified, individual objects discriminated, and Perceptual Representations built in Working Memory. Such representations then undergo sequential processing, implementing the top-down Cognitive Salience and Contextual Salience components, which, in turn, modulate Perceptual Salience. The objects and their respective salience profiles are then encoded or updated in Long-term Memory.

Crucially, while in its formulation this model is primarily concerned with the visual modality during active navigation, it is flexible enough to also account for the allocation of attentional resources to landmark salience determination during the processing of spatial language, and will therefore be of relevance when interpreting the results of the experiments presented in this thesis. Furthermore, as evidenced by

(31)

16

the study by Wegman and Janzen (2011) described in this section, eye movements could potentially be extremely valuable in studying the allocation of attention to landmarks or other navigationally salient features. In Experiment 1-3 I expanded this use of eye tracking to an analysis of attention allocation to landmark words in spatial texts and to landmark regions of map-like representations. This was done in order to study how the allocation of attentional resources (measured, for example, as changes in the number and duration of fixations) may be modulated by manipulations of the reference frames implicit in the route descriptions or of the imagined spatial perspective adopted by the reader. More generally, the goal of this research was to gain some understanding into the various forms of mental representations that might mediate between the extraction of navigational information from language and its use in the process of carrying out visuospatial tasks.

In order to provide a solid theoretical foundation for the research direction outlined here, in the next few sections of this chapter I will explore the literature on mental imagery and its connections to spatial cognition and navigation. Chapter 2 will then cover key research into the processing of spatial language and the imagery processes with which it interacts. In Section 2.6, I will then explore research on how eye movements can inform our understanding of attention allocation (and of related processes) as well as mental imagery processes during active navigation, language processing, and, more generally, during spatial cognitive tasks.

1.4.

The Organisation of Spatial Knowledge

In Sections 1.2 and 1.3 I introduced two key concepts for our understanding of navigation and spatial knowledge. As we familiarise ourselves with an environment, we do so by encoding the identity of salient landmarks and associating that information with an understanding of their spatial locations. These locations can be specified with respect to our own body-centred frame of reference or with respect to each other (or, indeed, both). Additionally, this knowledge must be stored and maintained in enduring representations that allow us to directly navigate an environment by, for example, following a prominent beacon-like environmental feature, but that can also support more complex navigational behaviours (e.g. mentally planning a route through an environment in which we are not currently located, or constructing linguistic descriptions of it).

The nature, format, and content of these representations have been the

subject of intense research since the mid-20th_{century. In studying the navigational}

behaviour of rats, Tolman (1948) challenged the idea that spatial learning was merely due to the learning of sequences of stimulus-response associations, with the strength of these associations varying as a function of incoming sensory input. Instead, he found that the rats were able to not only learn the configuration of a maze in order to reach a reward (i.e. food or water), but that this learning also took

(32)

17

place during non-rewarded trials. Additionally, he observed that the animals were able to plot an alternative route to a goal location (or to nearby locations) when the configuration of the maze was changed compared to their learning phase (e.g. by rotating the starting point of the maze by 180° relative to the room). He concluded that the rats could not have been relying on purely body-centred stimulus-response associations, but rather had developed a more comprehensive understanding of the spatial environment. On this basis, he hypothesised that the acquisition of spatial information is accompanied by its progressive organisation “into a tentative

cognitive-like map of the environment indicating routes and paths and environmental relationships” (p. 192). Kuipers (1978) stated “the cognitive map is like a map in the head. More accurately, it is like many maps in the head, loosely related, for the cognitive map certainly lacks the global consistency of a single printed map” (p. 132).

He termed a collection of loosely connected cognitive maps of varying levels of detail and at different scales a cognitive atlas (Kuipers, 1982), and acknowledged the phenomenological experience of cognitive maps, observing that “some people claim

to ‘see’ a map when they answer spatial questions” (Kuipers, 1978, p. 132). A

cognitive map was also seen as a network of streets and intersections, and a catalogue of routes, each route being “a procedure for getting from one place to

another […]” (p. 132). Denis and Zimmer (1992) described cognitive maps as “[…] internal representations of spatial environments, their metric properties, and the topological relationships linking their landmarks” (p. 286).

Since then, however, the map-like nature of cognitive maps has been challenged. Tversky (1981; 1992) has presented evidence of systematic distortions and heuristics in subjects’ spatial memories for locations and orientations. For example, figures within an array tend to be remembered as more closely grouped and aligned to the canonical reference axes (vertical and horizontal, or north-south-east-west) than they were in the original percept (Tversky, 1981; 1992). Additionally, curved paths are remembered as straighter than they are (Chase, 1983; Milgram & Jodelet, 1976), and landmark salience can generate asymmetries in distance judgements between salient landmarks and non-landmarks, depending on which is used as referent (McNamara & Diwadkar, 1997; Sadalla, Burroughs & Staplin, 1980). Furthermore, Holyoak and Mah (1982) observed that when participants were asked to assume a particular perspective or geographical viewpoint, they judged the distances between pairs of nearby cities (relative to the imagined viewpoint, termed

cognitive perspective) to be larger than the distances between pairs of more distant

cities. On the basis of these and more findings (for a more detailed review, see Tversky, 2000), Tversky (1993) introduced the notion of cognitive collage to define these error-prone representations of novel spaces resulting from the integration of multimodal information and knowledge, both spatial and non-spatial. This has more generally led to the idea that spatial cognition may rely on a multitude of different knowledge structures – ranging from more percept-like, metric and detailed (e.g.

(33)

18

mental images), to more abstract and topological (e.g. mental models) – computed ad-hoc from a number of different sources of information and to achieve specific goals (Mark, Freksa, Hirtle, Lloyd & Tversky, 1999). These structures will be discussed in turn in the following sections, creating a thematic bridge between spatial cognition and the broader domain of mental imagery research. This chapter will also introduce the idea of perceptual simulation as an additional form of mental imagery, potentially filling the gaps between what Tversky (2000) referred to as the Overview and View levels (corresponding to survey, or allocentric, and egocentric representations), and the Action level. However, I will begin by introducing the notion of mental imagery and offering a brief historical overview of the development of imagery as an area of research.

1.5.

Mental Representations and Imagery – A Brief History

Although the scientific study of mental representations in its current incarnation was developed after the cognitive revolution of the 1950s, the Greek philosophers Aristotle and Plato were already aware of its relevance to understanding the human mind and cognition. Referring to mental images as phantasmata, Aristotle described them as “a residue of the actual [sense] impression” and considered them to be central to his theory of memory, going as far as to claim that "It is impossible to think

without an image [phantasma]" (De Memoria 450a 1, as quoted by Thomas, 2016).

Although imagery continued to play a role in the work of several philosophers, such

as Descartes, Hobbes, and Locke, it wasn’t until the late 19th_{and early 20}th_centuries

that mental imagery began to be studied in the emerging discipline of psychology. Widely regarded as one of the founders of experimental psychology, Wilhelm Wundt championed a view of mental images that emphasised their percept-like nature, and described them as “[…] ideas [that] do not represent things of immediate perception;

briefly expressed, they originate in feeling, in emotional processes which are projected outward into the environment. This is an important and particularly characteristic group of primitive ideas. Included within it are all references to that which is not directly amenable to perception but, transcending this, is really supersensuous, even though appearing in the form of sensible ideas” (Wundt,

1916/2013, p. 75).

The view of mental imagery as an important psychological phenomenon in early experimental psychology was short-lived. In Würzburg, Germany, Oswald Külpe, a former student of Wundt’s, and his students began employing introspection and word association methods to study mental representations. Over the course of these experiments, participants frequently reported experiencing “events of

consciousness which they could quite clea