Aims of the Experiment - Simulation 1

Simulations of Learning to Count using the Robotic Model

6.1 Simulation 1 — Learning Number Words

6.1.1 Aims of the Experiment

The aim of this simulation was to gain knowledge about the progress of the first stage of the training of the model. As explained in section 5.3.1, this preliminary stage has been introduced to reflect the ontogeny of counting in children, who are able to recite the count list quite well before they start learning to count things. In

line with this, before the model is trained to count using vision and gestures, it is equipped with the ability to produce a sequence of number words, as if learnt by rote. This simulation provided certain basic information about the model and its training that were useful in determining the values of the parameters for subsequent simulations, such as the number of hidden units in the network or the length of the training. Although, as explained in chapter 5, the detailed modelling of the learning of the count list has not been among the main goals of the model, it was nevertheless interesting to compare the behaviour of the model with human data.

6.1.2 Procedure

In this simulation experiment only the first stage of the training was performed. The model was therefore used in its initial configuration, which does not include any of the optional modules (see figure 12). It was assumed that the model is trained to recite NV O = 10 number words. A generous amount of training was given to the model (10000 epochs) in order to assure that the learning process has enough time to converge. The other training parameters were as reported in section 5.3.1.

The number of hidden units in the network N_H was the main parameter that was varied in order to investigate how it affects the results of the training. Based on earlier informal trial-and-error experiments with the model in various configurations, the following values of NH have been chosen for systematic investigation: 6–12, and 15, 20, and 25. The 6–12 range has been estimated to contain the minimum number of units required for the network to successfully acquire the preliminary skill. The simulation aimed thus to determine the exact value of this limit. The latter three values were the numbers of hidden units chosen for the subsequent simulations, which include the second stage of the model training. These were considered in the present simulation to establish the required duration of the preliminary training.

After the training of the model was completed, the neural network was evaluated in order to determine if the training has been successful, using the criteria described in section 5.3.1. In addition, during the training the information about the learning

progress were collected. After every training epoch, the mean-squared error over the training data set, and the sequence of number words produced by the model in response to the trigger input were recorded. If the produced count list changed with respect to the previously recorded one, the number of the epoch at which the change occurred was recorded. Using these information it is possible to determine the epoch of the training at which each number word has been ‘acquired’ by the network, as well as the number of the training epochs necessary for the model to master the complete count list.

For each of the considered sizes of the hidden layer of the neural network 10 independent repetitions of the training were performed, with different initial weights of the connections, in order to estimate the training success rate for each hidden layer size. Since 10 distinct values for NH were considered, this resulted in the total of 100 repetitions of the simulation.

6.1.3 Results

The number of trials in which the training was successful for each of the considered sizes of the hidden layer NH are reported in table 1.

As a means of illustrating a typical progress of the number word learning in the proposed model in various cases, the evolution of the network output throughout the first stage of training for three experiment trials is reported in figures 18, 19, and 20. These figures show the number of the epoch of the training in which the output sequence first appeared and the output sequence itself. Numbers 1–10 designate the corresponding number words, while the dots represent silence (all output units of the network deactivated). The three instances of the training were chosen as representing typical model behaviour. Figure 18 represents an unsuccessful training case. The training in the trials illustrated in figures 19 and 20 was successful, but the latter progressed faster than the former, due to the larger size of the hidden layer.

Epoch Model Output

Figure 18: Number words learning progress in trial 028 (NH = 8, training not successful)

Increasing the size of the hidden layer affected the speed with which the model acquired the number words. Figure 21 shows the distributions of the numbers of the training epoch from which on the network began to use the number words correctly across trials with 15, 20 and 25 hidden units.

6.1.4 Discussion

The success rates reported in table 1 indicate that at least 11 units in the hidden layer of the proposed neural network model are necessary in order for it to be able to reliably learn a sequence of 10 number words. Although some successful attempts appear for N_H as small as 9, low success rate suggests that the training algorithm in this case easily gets stuck in the local optima of the training error landscape. These results are not surprising considering the employed type of the output layer (i.e.

linear), which imposes that the states of the hidden layer units have an analogous topology as the output vectors they correspond to. Since all output vectors that represent number words are orthogonal to each other (cf. section 5.2.3), the same has to be true for their corresponding internal representations — and since in the simulation N_{V O} = 10, this requires the internal representation space to be at least 10-dimensional. Occasional appearance of the successful training attempts in the 9-dimensional case can be explained by the fact that the employed protocol of the model evaluation, based on nearest-neighbour classification, allows for a degree of imprecision at the network output. The 10 internal states in those successful cases must have been packed into the 9-dimensional space in such a way that, although not exactly orthogonal to each other, they were close enough to being in such a state.

There are several characteristic features of the progress of the number word learning in the proposed model that can be observed in figures 18, 19, and 20.

Initially, the output of the model is a random number word that usually does not change over time. This is the consequence of the fact that before the training the weights of the neural network are initialised randomly. After a few training epochs,

Epoch Model Output

Figure 19: Number words learning progress in trial 042 (NH = 10, training successful)

N_H 6 7 8 9 10 11 12 15 20 25 Successful Trials 0 0 0 3 8 10 10 10 10 10 Table 1: Number of trials (out of 10) in which the preliminary training stage was successful for the considered hidden layer sizes N_H

Epoch Model Output

1 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4

2 7 · 7 · · · · · · · · · · · · · · · · ·

3 · · · · · · · · · · · · · · · · · · · ·

4 3 · · · · · · · · · · · · · · · · · · ·

5 · · · · · · · · · · · · · · · · · · · ·

31 1 · · · · · · · · · · · · · · · · · · ·

64 1 2 · · · · · · · · · · · · · · · · · ·

73 1 2 3 · · · · · · · · · · · · · · · · ·

132 1 2 3 4 · · · · · · · · · · · · · · · ·

185 1 2 3 4 5 · · · · · · · · · · · · · · ·

244 1 2 3 4 5 6 · · · · · · · · · · · · · ·

309 1 2 3 4 5 6 7 · · · · · · · · · · · · ·

333 1 2 3 4 5 6 7 8 · · · · · · · · · · · ·

363 1 2 3 4 5 6 7 8 9 · · · · · · · · · · ·

364 1 2 3 4 5 6 7 8 · · · · · · · · · · · ·

365 1 2 3 4 5 6 7 8 9 · · · · · · · · · · ·

367 1 2 3 4 5 6 7 8 · · · · · · · · · · · ·

368 1 2 3 4 5 6 7 8 9 · · · · · · · · · · ·

612 1 2 3 4 5 6 7 8 9 10 · · · · · · · · · ·

613 1 2 3 4 5 6 7 8 9 · · · · · · · · · · ·

627 1 2 3 4 5 6 7 8 9 10 · · · · · · · · · ·

Figure 20: Number words learning progress in trial 075 (NH = 15, training successful)

0 200 400 600 800 1000

Figure 21: Progress of number words learning across 10 trials for the neural networks with 15 (a), 20 (b), and 25 (c) units in the hidden layer. This is a box plot that illustrates the distributions, across the 10 trials, of the epoch numbers, in which the number words were acquired by the model. The boxes extend from the lower to the upper quartile of the data, the whiskers extend to the most extreme data points (limited to 1.5 times the quartile range), the grey line indicates the median, and outliers are shown using the + symbol.

this random output is suppressed and the model begins to learn the correct count list, starting from the beginning. Normally the learning progresses with consecutive number words, but occasionally a number word can be acquired by the model outside of the conventional sequence (e.g. number word ‘six’ in figure 18, epoch 656). The latter phenomenon did not appear frequently and tended to occur for the lower values of NH. One of the most distinct features of the learning are the prolonged phases in which a number word is interchangeably produced by the model and not, before it starts to appear at the output consistently (e.g. in figure 19 this phenomenon is evident for number words ‘three’, ‘four’, ‘six’, ‘eight’, ‘nine’, and ‘ten’). As more hidden units are added to the model, these phases are visibly shortened, but can still be observed even for NH larger than the required minimum (cf. figure 20). It is important to note that even for the small sizes of the hidden layer and the cases of unsuccessful training, the model did not produce the number words outside of their corresponding time steps, except for the very early stages of the training. In the cases of unsuccessful training the model did not manage to learn all 10 number words, but only a certain amount of the beginning of the count list (see e.g. figure 18).

The progress of the learning of the count list by the proposed model resembles to a limited degree the equivalent process in children (cf. section 2.3.2). Since, once trained, the employed neural network framework is deterministic (i.e. the output produced by the model is always the same for the same input to the network), the stochastic within-subject effects present in the children’s behaviour, such as the un-stable non-conventional portion of the count list, are not reproduced. As illustrated in figures 18, 19, and 20, the model also does not seem to have a tendency to emit stable non-conventional sequences. Rather, at each time step of the simulation the model either produces a correct number word, or does not utter anything (an ex-ception to this, as mentioned above, are the very early epochs of the training). The behaviour of the model is however consistent with that of children with respect to the acquisition of the stable conventional portion of the count list. The count list

produced by the model starts with small numbers and is gradually extended toward the larger ones. Furthermore, prolonged phases during which a consolidation of the most recent extension takes place can be clearly distinguished (see e.g. figure 19, cf.

Fuson et al., 1982).

Based on figure 21 it is possible to determine the required duration of the pre-liminary training stage for subsequent simulations. Since in the further experiments models with NH = 15, 20 and 25 are used (with NH acting as a between-subjects factor), the number of the epochs in the first stage of the training should be sufficient to ensure reliable success of this stage for all considered values of NH. As figure 21 indicates, the amount of training required for the model to learn to recite 10 num-ber words is the largest for NH = 15. Since in 9 out of 10 cases 700 epochs were sufficient for the model to master the entire count list (there was only one outlier case for which around 900 epochs were necessary — see figure 21a), this number has been used as the duration of the preliminary stage in all subsequent simulations.

6.2 Simulation 2 — Impact of the Preliminary

In document Modelling Learning to Count in Humanoid Robots (Page 175-184)