Model Overview
5.5 Model Implementation
The proposed neural network model has been implemented using the PyBrain ar-tificial neural network library for Python (Schaul et al., 2010). This significantly contributed to the portability of the implemented code. All inputs and outputs to the neural network were implemented using units with a linear activation func-tion. Units of the hidden layer used the logistic activation funcfunc-tion. The training algorithms (RProp− and backpropagation through time) were used in the imple-mentations provided by the PyBrain library.
5.6 Discussion
As the means of summarising the description of the model presented in this chapter, it is appropriate to discuss the proposed architecture in the light of the past efforts to model various aspects of counting, reviewed in chapter 3.
In contrast to the models of Amit (1988), Hoekstra (1992), and Rodriguez et al.
(1999), the model counts static, rather than sequential stimuli. It can be argued that this corresponds more closely to the context in which the children gain the majority of their initial experience with counting. When the children enumerate toys they are playing with, or pictures in a book they read with their parents, the set being counted is static, and its numerosity is related with its spatial, rather than temporal, characteristics. Whereas counting sequential (e.g. auditory) stimuli is of course a realistic scenario, it is not the one that primarily contributes to the children’s acquisition of the counting skill. The distinction between enumerating spatially and temporally conveyed sets is particularly important in the context of the investigation of the contribution of the gestures to learning to count (what is tightly connected with the research questions 2 and 3 which the model aims to address). As discussed earlier in chapter 2, the establishment of a correspondence between the spatial aspect of the visually presented set and the temporal aspect of the recited count list is one of the crucial elements of mastering counting, and the
one in which the gestures are likely to be particularly helpful. It should be clear therefore, that in the present study it is most appropriate to consider static stimuli.
Although learning to recite the sequence of number words is one of the important aspects of the training regime of the proposed model (section 5.3.1), it is worth to stress that a detailed reproduction of the subtleties of the equivalent process in children is not the main focus of the present work. Ma and Hirai (1989) proposed a model which is capable of explaining many phenomena that appear in this context.
While a simulation which focuses on the learning of the counting list is conducted in chapter 6, the proposed neural network is not expected to exhibit all the minute details of this process, such as those connected with the phonetic similarity between certain number words in the English language (Fuson et al., 1982). This is because the mechanism of learning sequences in the employed neural network architecture is likely not the best available model of rote learning in humans. More emphasis in the simulations will be put on how mastering the count list may affect the subsequent process of learning to count (in connection with the research question 1).
Many similarities can be found between the model described herein and the one proposed by Ahmad et al. (2002), however there are important differences between these approaches. A prevailing theme in the models of Ahmad et al. is the applic-ation of the mixture of experts architecture on several levels of the model design.
While it is no doubt an elegant and interesting machine learning device, Ahmad et al. provide no justification, e.g. in form of the evidence of its biological plausibility, for employing this solution so abundantly. Furthermore, at least in some instances where Ahmad et al. insert the mixture of experts model, it can be argued that its application is superfluous. An example worth focusing on is the counting module of their scousyst model (Ahmad et al., 2002, pp. 187–197). It is composed of two subsystems, word, a simple recurrent network (Jordan type), and next-object, a feed-forward network. Ahmad et al. argue that the mixture of experts architecture on top of those ‘selects an expert network that is optimal for the designated subtask’
(Ahmad et al., 2002, p. 189), delegating each of the subtasks (production of words
and of indicating acts) to the appropriate sub-network. However, the fact that a Jordan network is an extension of a feed-forward network, and is therefore perfectly capable of learning the exact same mappings as the latter, suggests that such a solution is complicated above what is necessary. What makes matters worse, since the gestures and the recitation of the number words are implemented by separate neural networks, the internal representations employed in those tasks are prevented, by the model design, from interacting. This is in stark contrast with the behavioural data reviewed in chapter 2, which indicate that, during learning to count, the ges-tures affect the counting performance substantially. The situation just described is a prime example of the problem discussed while introducing the design assumption 3 at the beginning of this chapter. While the architecture of Ahmad et al. is no doubt capable of exhibiting a counting-like behaviour, the fact it is unnecessarily overcomplicated limits its usability as an aid in the cognitive study of counting. The design of the model of counting proposed in this thesis is significantly simpler than that of Ahmad et al., as one of the design objectives was to endow the model with the capabilities necessary to perform the task, but not to bias its operation without sufficient justification.
Finally, it is worth to highlight an important feature of the design of the proposed model of learning to count, namely its considerable flexibility, which allows a wide array of modelling scenarios to be tested in a unified way. This flexibility has at least two dimensions. The first one is connected with the set-up of the neural network
— more specifically, the counting gestures may either be an input to or an output from the model (cf. section 5.2.2). The second one results from the fact that the model is compatible with a variety of representations. The latter means that the proposed neural network may be used to compare the consequences of employing different approaches to represent information in the context of counting. This applies equally to the representation of the speech, counting gestures, as well as the visual information. Since several different approaches are possible for each modality, the investigation of all realisable scenarios was unfortunately not possible within the
scope of the present thesis. Thus, it is important to keep in mind that the utility of the proposed model goes beyond addressing only a few research questions considered herein.
Summarising, the model presented in this chapter constitutes a novel contribu-tion to the state-of-the-art in modelling the role of gestures in learning to count in that:
this is the first cognitive model in the context of mathematical cognition de-signed according to the developmental cognitive robotics paradigm (cf. section 4.2), closely linked with an artificial body to represent embodied phenomena.
Considering the ample evidence for the embodied nature of human numerical knowledge in general, and counting in particular (cf. chapter 2), this is an important step forward;
as the consequence of the above, this is the first model to employ a realistic representation of the counting gestures based on the actual pointing performed by a humanoid robot endowed with dexterous arms that have been designed to resemble human arms as close as possible (cf. section 4.3);
it is the first model of the contribution of the counting gestures to learning to count. Although one of the previously published models of counting incor-porated gestures (Ahmad et al., 2002), several design decisions in that study severely biased the ways in which the gestures could affect the counting pro-cess. One of the aims in the present work is to minimise such biases;
the proposed model is conceptually simple, but at the same time flexible with respect to the representations of the various aspects of counting (such as the encoding of the visual, proprioceptive and verbal information), what allows a wide variety of hypotheses to be tested using the model;
The proposed model is investigated in a series of simulations which are described, and the results of which are reported in the subsequent chapter.