Working memory - On Recurrent and Deep Neural Networks

The concept of working memory (WM) refers to, following the definition from Durstewitz et al.(2000), “the ability to transiently hold and manipulate goal-related information to guide forthcoming actions.” To these properties we believe an important addition is stability. Or at least the information can be stabilizable on demand, e.g. by rehearsal.

Figure 5.2: Bifurcation map for the logistic map. Each vertical line of the figure corresponds to the phase portrait for a certain value of the parameter ✓. We only plot the di↵erent attractors set of the phase portrait. Note how for ✓ < 2.9 the system has a single point attractor. For ✓ > 2.9 the single point attractor converges into a 2-state periodic attractor, which will end up into a chaotic regime when ✓ > 3.569.

Let us rephrase this definition in specific terms related to recurrent models. Working memory describes the ability of the model to remember information not by storing it in its weights, but rather in its hidden state. That is the model should be able to memorize sequences that it has not seen during training if there are some cue signals stating that such information will be useful further on. The memory trace should only exist during the evaluation of the model on a particular input sequence and not be part of the model.

The information stored has to be useful for the task to be solved and should continuously influence or define the behaviour of the model on incoming inputs. In principle, this information should be stored as long as needed, potentially for the entire length of the sequence.

A large portion of the RNN oriented literature on working memory is concerned with models that explain behavioural and neural observations from cognitive pro- cessing experiments (reviewsDurstewitz et al. (2000);Howard (2009)).

A common belief is that stable short-term memory for RNNs is realized through attractors. Following this intuition a large number of attractors had been explored in various contexts.

The biologically oriented literature relies on point attractors (cell assemblies and bistable neurons) and traveling waves (synfire chains) as discussed in Durstewitz et al. (2000). In theoretical physics, spatiotemporal attractors (or pattern forma- tions in excitable media) are explored, with connections into computational neu- roscience and robotics via neural field theories of cortical representation (Sch¨oner et al., 1995; Freeman, 2007a,b). Coupled oscillators, that can be connected to attractors in spiking neural networks, are also thoroughly studied, see, for example, Radicchi and Meyer-Ortmanns (2006).

Chaotic attractors have been investigated as information representing neural mechanism inYao and Freeman(1990) andBabloyantz and Louren¸co(1994). Such models of memory o↵er a rich structure (due to the complexity of the chaotic attractor) and the possibility to stabilize or address sub-lobes as representational units (Stollenwerk and Pasemann,1996; Tsuda, 2001).

One fundamental flaw of the attractor view of working memory is that, by definition, attractors keep the system trajectory confined in their support (that is the model can not leave the set of points defining the attracting set). Cognitive dynamics do not seem to have this property. We can forget information, for example, which would be equivalent with leaving this support set of the attractor representing the stored information.

An important challenge for this view, is therefore, explaining how these attractors can be left. Many possible answers had been proposed. One provided solution is that of neural noise which can kick the trajectory out of some attractor. This solution is, however, unsatisfactory. Noise can not be specific, and forgetting information does not seem a random process. We need a more controlled (and input driven) mechanism to leave these attractors.

If we allow ourselves to move away from the standard definition of an attractor, attractor-like phenomena have been considered to realise memory. Such phenomena usually emerges in high-dimensional nonlinear dynamics: saddle point dynamics (Rabinovich et al.,2008;Sussillo and Barak,2013); attractor relics (or attractor ruins) where classical attractors in a fast-timescale subsystem are destroyed by a slow-timescale saturation dynamics (Gros, 2009); transient attractors defined by transient volume contractions of a flow (Jaeger, 1995); unstable attractors, which are classical attractors that appear in certain spiking neural networks and can be left under the impact of arbitrarily small noise because they are surrounded arbi-

Input Units Reservoir Output units

h

y

u

WM units

m

Figure 5.3: Diagram of the WM model. Dashed connections are trained, the others are left untouched. Note that the main di↵erence to a standard echo state network is the presence of memory units. These units di↵erentiate themselves from output units by having trainable connections among themselves. Also in our setup the output units do not have feedback connections.

trarily closely by basins of other attractors (Timme et al., 2002); high-dimensional attractors (initially named partial attractors) which govern only a subset of the dimensions of a high-dimensional phase space (Maass et al., 2007); attractor land- scapes shaped by a control parameter (input) describe dynamics of a system which lead to the appearance and disappearance of attractors due to incessant bifurcations (Negrello and Pasemann, 2008).

This is only a subset of all the di↵erent proposed mechanism for working memory. Biological brains might also end up using several of these phenomena simul- taneously.

None of the above mentioned methods seem to address all the properties that we would expect from working memory. In what follows we will start by first providing a specific structure and learning rule that results in echo state models that can exhibit memory. This will serve as a proof of concept that while the mechanism behind this behaviour is not clear, the desired behaviour can be obtained.

In document On Recurrent and Deep Neural Networks (Page 178-182)