• No results found

Experiment 1: A Cognitive Robotic Model of Mental Rotation

In document Mental Imagery in Humanoid Robots (Page 88-105)

Modelling Mental Rotation

5.1 Experiment 1: A Cognitive Robotic Model of Mental Rotation

In this experiment we propose a computational model, investigating a neural operational hypothesis on how the information processing taking place in parietal and premotor areas might be involved in mental rotation. This operational hypothesis is based on the integration of affordances and forward model accounts for mental rotation. These processes include (Lamm et al., 2007): (a) stimulus encoding and mental image generation, (b) planning and execution of the mental rotation, (c) comparison (matching) of the rotated stimulus with the target stimulus, and finally (d) execution of the same/different response. Combining these two perspectives within the model allows us to deal with all the levels of complexity required by a mental rotation task (not only the processes “a-b” indicated above (mental rotation proper), but also “c-d” (control and exploitation of the mental rotation processes).

To this purpose, the model leverages on the computational model “TRoPICALS” (Caligiore et al., 2008; 2010; 2012) developed to study affordance compatibility effects (Tucker & Ellis, 2001). The TRoPICALS model is a good starting point to design a model of mental rotation as it reproduces some key functions of the parietal-premotor circuit, which are crucial for stimulus encoding and extraction of object affordances (process “a”). TRoPICALS also includes important features of the prefrontal-premotor circuit, pivotal for managing other aspects of mental rotation (processes “c” and “d”). However, it cannot perform mental image rotations, as it lacks the necessary feedback circuits. In this respect, to address the core mental rotation process (process “b”) the model proposed here enhances the functions of TRoPICALS by developing two new key features. First, it is endowed with premotor-parietal feedback loops that allow it to implement mental rotation and sensory prediction based on forward models. Second, it

is endowed with an improved visual and motor system allowing it to scale up to more realistic 3D environments and robotic setups.

The rest of this section is organized as follows. Sec. 5.1.1 discusses the main features of the model, the learning algorithms used to train it, and the robotic set up used validate it. Sec. 5.1.2 presents and discusses the results. Sec. 5.1.3 drives the discussion and proposes future work to improve the model.

5.1.1 Methods

A. Neural Architecture

The model proposed here represents an operational hypothesis on how visual and motor neural processes might interplay during mental rotation. To this purpose it extends some features of the TRoPICALS model (Caligiore et al., 2010). Figure 5.1 shows the model architecture which consists of three main parts corresponding to specific areas of the brain mainly involved during mental rotation tasks (Lamm et al., 2007; Richter et al., 2000): the parietal cortex (PC), the premotor cortex (PMC), and the prefrontal cortex (PFC). These areas are represented by distinct neural maps activated using population code methods (Pouget, Dayan, & Zemel, 2003; Deneve, Latham, & Pouget, 1999). The population code hypothesis postulates that information, e.g., on stimuli and actions, is encoded in the brain on the basis of the activation of populations of neurons organized in neural maps, having a broad response field. In particular, each neuron responds maximally to a certain value of the variables to encode, and then progressively less intensely to values (based on a Gaussian function).

The neurons of the PC map (32 x 32 neurons) encode the shape and orientation of the object that has to be mentally rotated (Caligiore et al., 2013). The PMC consists of 2 neural maps PMC_1 (31 x 105 neurons) and PMC_2 (10 x 20 neurons), encoding motor

programs related to different arm parts (Wolpert & Kawato, 1998). PMC_1 neurons encode a specific wrist posture of the robot corresponding to a specific object orientation encoded in PC. PMC_2 neurons encode the two different hand postures that the robot produces to accomplish the mental rotation results (i.e., to indicate if two objects are same or different). In more detail, the model works with 2 different types of object, each with 13 different orientations. Therefore, the neurons of the PMC_1 map encode 26 possible wrist postures.

Figure 5.1 The model of mental image rotation. Each box represents the model’s components. The arrows

represent information flows from one component to another. The arrows accompanied by the letter “C” are the connections learned by SOM learning rule (dashed arrows) or by Hebbian learning rule (solid arrows). ©2013 IEEE

The PFC also has two maps, implementing the working memory (PFC_1, 32 x 32 neurons) and the matching process area (PFC_2, 32 x 64 neurons) (Fuster, 2001). The visual input for the model is the image of a simulated camera of one of the eyes of a simulated iCub robot. This image goes through an edge detection module to extract edge information of the two objects shown in front of the robot. The edge information for the object on the left will be passed to the PC, while the one for the “target object” on the right will be for PFC_1. The target object is used as a reference for rotational purposes. The robot has to mentally rotate the object encoded by PC to check if it is the

same or it is different with respect to the target object stored within PFC_1. PFC_2 is the core for the matching process. It is formed by a Kohonen self-organizing map (SOM) (Kohonen, 2001) which takes inputs from the PC and PFC_1. A major processing characteristic of the SOM is clustering. It transforms high dimensional inputs into low dimensional ones. Each input will then be represented in a unique area in the SOM map. We exploit this characteristic to create a neural map that represents pairs of stimuli. At the end of the matching process, PFC_2 neurons trigger PMC_2 activation whose neurons in turn encode the answering behaviour.

The mental rotation process is mainly based on the interactions between the PC and the PMC_1. Consistently with the concept of affordances, the visual features of the object (shape and orientation) encoded by the PC cause a specific cluster of neural activity in PMC_1. This pattern encodes the motor response to the seen object (i.e., a specific wrist rotation either clockwise or counter-clockwise, represented in terms of the posture assumed by the robot’s wrist). Conversely, the PMC_1-PC circuit works as a forward model, based on which cluster of activity in PMC_1 causes a change of the image orientation in the PC.

B. Learning process

Connections between maps are trained using Hebbian learning and SOM competitive learning. Hebbian learning is widely accepted as a biologically plausible learning mechanism mainly involving cortical areas (Doya, 2000). This learning mechanism underlies some developmental phenomena. One example is the critical period of learning (Munakata & Pfaffly, 2004), where synaptic efficacy cannot be modified and re-form after it has been settled.

At the beginning of the simulation, the weights of all the connections (C1, C2, C3, C4, and C5) are randomly set within the range [0, 0.1]. Then the simulated mental

rotation experiment follows 4 steps: 1) Stimulus encoding, assigning the edge information of the left stimulus to PC. 2) Execution of the mental rotation, repeating the interaction between affordances (PC-PMC_1 circuit) and forward model (PMC_1-PC circuit; C1, C2 connections) processes. 3) Comparison, performing the matching process of the mental image and the target image in the SOM map (PFC_2; C3, C4 connections). 4) Answer triggering, executing of the same/different response (PFC_2- PMC_2 circuit; C5 connection).

The connections C1 are used to simulate affordance learning through the transformation of information from PC to PMC_1 (Fagg & Arbib, 1998). The training set consists of pairs of the left stimulus (within PC) and a specific cluster of activity (Gaussian tuning curve) which represents the affordance provided in PMC_1 (i.e., the robot's wrist angle). For each pairs, the C1 connections are trained by using the Hebbian learning rule (Eq. 4.1).

After training C1, an image of an object from the PC causes a specific cluster of activity in PMC_1, that represents a wrist posture. Through training the network learns how to rotate the robot’s wrist corresponding to the orientation of a seen object.

The connection C2 is responsible for forward model learning. In contrast to the affordance processing, this connection causes the formation of an image representation in PC from the cluster of activity in PMC_1. For instance, a cluster of activity in PMC_1 that is caused by an image of object rotated 90 degrees in PC (during affordance processing) causes a 75 degrees rotated image back in PC. This training strategy allows the network to create a series of rotating images. Note that the training set causes an image to gradually change to become the same as an image of object of 0 degrees. This corresponds to the central position in PMC_1 map, which refers to the target position. When a rotation of an input image is greater than 0 degrees, the image

will be rotated on the right (clockwise). In contrast, if the angle is less than 0 degrees, the image will be rotated on the left (counter-clockwise). The C2 connections are also trained with the Hebbian learning rule.

The process of mental image rotation consists of the repetition of the interaction between the affordance process (connection C1) and the forward model process (connection C2), until an image in the PC reaches the 0 degrees target rotation. Each cycle of the interaction causes a rotated image, which can be considered a mental image because the actual input object orientation does not change.

The connections from PC and PFC_1 to SOM PFC_2 (C3, C4) are responsible for the matching process. When the network generates a mental image in the PC, having a 0 degrees rotation, then the process of learning is triggered. The connections link two maps, one is PFC_1 (target image), which is set at the beginning of the simulation, and another is PC (the mental image). A training set for PFC_2 is a combination of all the possible neural representations for the stimuli of each input. A neural activity in PFC_2 forms a salient cluster with respect to the two specific inputs. As there are two possible images in each map, four clusters will be formed. To train PFC_2, the SOM learning rule (Eq. 4.3) was used.

The PFC_2 SOM map is trained in advance. In this way, a response of PMC_2 can be fixed for each input couple from PC and PFC_1.

The answer triggering process uses the connection C5 from PFC_2 to PMC_2. When two images are “similar” the robot chooses the “YES” answer, otherwise it chooses the “NO” answer. The term “similar” means “it is approximately the same”. The mental rotation ends when the position of cluster of activity in PMC_1 is close to the central position. The most salient cluster in PFC_2 is used to produce the answer. Given the four possible combinations of inputs in the matching process, two of them are

responsible for a "SAME" answer, while the remaining two for the "MIRROR" answer. Therefore, two regions in PFC_2 with respect to the same image from the PC and PFC_1 cause one cluster in PMC_2. While two other regions within PFC_2 represent different images of the two input maps. In this process, PMC_2, is responsible for the answer triggering, the motor response to press two answer buttons or to produce some utterance such as “YES” or “NO”. In the current version of the model this motor command is still not used to supply a control signal for the iCub but is directly interpreted as the response of the system.

After learning, an action potential of each neuron in the PMC_2 map is calculated by using a dynamic competition method (Doya, 2000). As the connections within a neural map are based on an all-to-all pattern, each neuron in the map sends/receives signals to/from every neuron. The dynamic competition process causes dynamic activities within the map, based on a distance between neurons following the rule of long-range inhibition and short-range excitation. Neighbouring neurons which are activated with high potential will receive excitatory signals and tend to form clusters of activity. In contrast, the neurons which are far from the active neuron in the neural space will receive an inhibition signal and their action potential will be depressed.

The dynamic competition is also used as a method to calculate an agent’s response time, e.g., to compare the model results with reaction time data in psychology experiments. Unlike a simple feed-forward process in layered neural networks, the dynamic competition process will be repeated until the action potential of at least one neuron in the neural map reaches a specific threshold. This process can be used to calculate the response time based on the action potential of an individual neuron that is most sensitive to a particular input. In detail, the number of repeating dynamic competition processes was recorded and used as a simulated response time. One cycle

of repeating the process will be assumed to be equal to 1 millisecond (Caligiore et al., 2008).

C. The simulated participant (the iCub robot)

According to the view of embodied cognition (Pecher & Zwaan, 2005; Pezzulo et al., 2011), our cognitive capabilities to recognise and understand things have been shaped by the interaction processes between body, brain and environment. In addition, cognition is based on internal representations and simulations of real world actions and our perception (Barsalou, 1999).

Cognitive robotics platforms, such as humanoid robots, are being increasingly used to model embodied cognition and cognitive development in humans by means of embodiment (Caligiore et al., 2008; 2010; Cangelosi & Schlesinger, 2015). Following this approach, a simulation model of the humanoid robot iCub was used to model psychological experiments on the embodiment bases of mental rotation.

Each arm of the iCub has 16 joints. This experiment uses the joint number 5 of the right arm which directly affects the robot wrist’s angle. If the robot holds an object with the right hand, rotating the wrist will only change orientation in the object plane.

Figure 5.2 The iCub simulator and its environment. ©2013 IEEE

D. Stimuli and Simulated Mental Rotation Task

The visual stimuli use an abstract object, coloured in red, similar to an upside down letter L as shown in Figure 5.3. In this experimental set up, two versions of these stimuli

are used, each producing a mirror image of the other, and will be called object-A and object-B. The objects are displayed in the space in front of iCub simulator (Figure 5.2). During the process of affordance training, only one stimulus is shown in the left position, with the experimenter varying the orientation of the object and assigning a corresponding target position of the robot's wrist angle. In the testing session, two stimuli are displayed in the left and in the right positions. In each trial, the rotation of the left image is systematically varied, while the right one is presented with a 0 degrees orientation and can involve the two objects A and B.

object-A object-B

Figure 5.3 The two stimuli used for the simulated mental rotation task. Both stimuli are coloured in red

for edge detection. ©2013 IEEE

The edge detection method is used as an early visual processing stage. The image is centred on a single object, and the red colour filter is applied. The edges of the object are extracted with the Canny edge detection technique (Canny, 1986), using the OpenCV library. The output from the edge detection process consists of binary data which can be directly assigned as an activity level to PC and PFC_1 at the beginning of the simulation. Note that the eye position of the iCub was fixed, the object of interest will be extracted and put in the centre of the image maps e.g., V1 throughout the experiment.

Regarding the motor response, there is a limitation of the iCub’s wrist angle, which can rotate in the range of [-90; 90] degrees. Counter-clockwise orientations are indicated by positive values, while clockwise orientations are indicated by negative

values. For example, in Figure 5.3 object-A has a 45 degrees orientation while object-B a -45 degrees one.

5.1.2 Results

The right object is always shown at a 0 degree rotation, while the left object can vary in orientation between 90 and -90 degrees. Therefore the maximum angular disparity between the two stimuli is 90 degrees. Varying them by 15 degrees (0, 15, 30, 45, 60, 75), as we did, this will typically require a maximum mental rotation in the map PMC_1 of 6 steps. However, in the experiment the maximum number of rotation cycles is set to 10 as in some cases the model cannot rotate the image to a preferred orientation at the first cycle, thus requiring extra rotations. When the number of rotation cycles is equal to 10, it indicates that the model cannot correctly perform the mental image rotation of the left stimulus and will be forced to do the next step (matching process) by using the last image. The interaction between affordance and forward model processes leads the model to obtain a linear relationship between the angular disparity and a number of steps used in rotation.

The experiment is conducted using two groups of inputs, one for a recognition test and another for a generalization test. In the recognition test, orientations of the left stimulus are the same as in the training set by varying 15 degrees per pattern from 90 to -90. As there are two possible objects and each of them can have 13 possible orientations, this test has exactly 13x4=52 different pairs of stimuli to be used as input. The generalization test refers to testing the model with unseen orientations. The left stimulus in this test changes 5 degrees from 90 to -90 but skip the cases of repeated values of the previous test. Therefore, the generalization test has (37x4)-52 = 96 pairs of stimuli to be used as input. Both tests were repeated 52 times to record the consistency of the model performance. The result shown in Figure 5.5a is a series of mean values of

response time of the recognition test.

(a) (b)

Figure 5.4 Mental image rotation steps. a) Rotational steps in the case that the model is able to create a

series of image changes to reach the 0 degrees default orientation; b) the model is unable to rotate the seen object. ©2013 IEEE

Figure 5.4a shows the mental rotation steps (PC) and the matching (PFC_2) and answering (PMC_2) processes for a successful trial. In this example the mental rotation process takes 4 steps to rotate an image of a stimulus of 60 degrees to an image of stimulus of 0 degrees. The mental rotation process ends when the rotated image reaches 0 degrees orientation. After that, the matching process within PFC_2 is performed by using as input, the neural activity of target image in PFC_1, and the rotated image in PC. The neural activation representing the matching process within PFC_2 is showed in the third column of the last row in Figure 5.4a. The answering process of PMC_2, is indicated in the fourth column of the last row on Figure 5.4a. The cluster of activity formed in the left side of the map will cause the answer "YES" to be chosen. The blank panels indicated that the rotational steps needed in this sample are less than 10.

In contrast, Figure 5.4b shows one case in which the model cannot rotate the left stimulus of -90 degrees of object-A into the 0 degrees default position. The model fails to rotate the image within 10 cycles, and has to do the matching process by using the last (un-rotated) image in PC. This scheme is similar to a guessing process in human subjects, when the time to do a mental rotation task is over. When the model fails to rotate the image after 10 cycles: each cycle, the image in the PC is the same. This case might be caused by a similarity effect of the edge information of objects in the training set. Indeed, the edge information of object-A and object-B of 90 and -90 degrees which are similar in pattern, as they mostly lie on the horizontal axis in the centre of the map. This means the model has to learn to match 4 similar inputs related to 4 separated

In document Mental Imagery in Humanoid Robots (Page 88-105)