Dynamic coding in a large-scale, functional, spiking-neuron model

(1)

Dynamic coding in a large-scale, functional,

spiking-neuron model

Bachelor’s Project Thesis Loran Knol Supervisor: Dr J.P. Borst

Abstract: A functional, large-scale, spiking-neuron model of working memory (WM) was adapted to display the patterns often cited in EEG studies as evidence of dynamic coding. The model had a mechanism for temporarily adjusting its own inter-neuron connection strengths following network activation, which served as the main memory mechanism. As for dynamic coding: It is a phenomenon observed in the human brain, in which information is represented in a particular way at time step t, but is represented differently at time step t + 1, while, crucially, the information itself does not change. In a previous, human EEG study, data were obtained that showed such dynamic coding patterns. Two experiments from that study were conducted with the model, and the model results were compared to the corresponding human data. The comparisons showed that the model and human coding patterns display many similarities, but also some differences. Moreover, the model performed not as well as humans did. Eventually, however, it was concluded that the model did in fact display dynamic coding, which would mean that dynamic coding might simply be a property of any self-modifying network. This calls for a perspective on dynamic coding that is slightly more modest than what is suggested in the existing literature.

1 Introduction

Working memory (WM) is an important part of hu-man cognition (Danehu-man and Carpenter, 1980). It is involved in maintaining and/or modifying infor-mation from the senses or memory in order to com-plete a specific task (Baddeley, 1992). Although most researchers previously agreed that WM infor-mation would be maintained with the help of sus-tained neuronal firing patterns in the lateral pre-frontal cortex (lPFC) (e.g. Curtis and D’Esposito, 2003), more recent studies have noted a decrease of such WM-specific patterns while information main-tenance was still required (Sreenivasan, Curtis, and D’Esposito, 2014). Curiously, this decrease would appear even when tasks were executed correctly (Watanabe and Funahashi, 2014), leading to the idea that WM might sometimes function in an ef-fectively activity-silent way.

Previous studies (e.g. Wolff, Jochim, Aky¨urek, and Stokes, 2017) have demonstrated that when WM-specific activity is present (prior to

decreas-ing to noise levels), it has the interestdecreas-ing property of not statically representing the information it en-codes. Instead, EEG measurements show that in-formation is represented dynamically, meaning that while information is encoded in a certain way at time point t, it may be encoded completely dif-ferently at time point t + 1 - but, crucially, the information itself, that what is encoded, does not change.

Stokes (2015) (preceding the study from Wolff et al. (2017), which is just one example of a study incorporating dynamic representations) has attributed these dynamic properties to complex in-teractions between a network’s activity state and its underlying ‘hidden’ state, where the latter refers to the collection of neurophysiological parameters that determine the network’s behaviour (e.g. the amount of calcium at a neuron’s presynaptic ter-minal at a given time). It is called ‘hidden’ because those parameters, as opposed to network activity, are typically not measured. According to Stokes, a certain stimulus would invoke some activity state

(2)

under influence of an already present hidden state. That invoked activity state then modifies the ini-tial hidden state, which causes (through recurrent connections) a new activity state, which again mod-ifies the hidden state, and so on and so forth. This interaction would happen on “the very shortest timescales”, according to Stokes.

The present study, however, proposes a model that displays the patterns found in EEG studies cited as proof for dynamic coding, but through a simpler process. More specifically, an existing neural spiking model designed by Pals, Stewart, Aky¨urek, and Borst (2020) was adapted. It uses the Nengo framework as designed by Eliasmith (2013). While the model does feature a form of hidden states as mentioned above, it does not incorporate the extensive reciprocal interaction between hidden and active states Stokes envisioned. In the follow-ing sections, the original Pals et al. model will be discussed in more detail along with the Wolff et al. (2017) experiments with which it was tested. After that, the model adaptations done by this study will be considered, followed by an explanation of certain analyses (employed by Wolff et al.) that test for dynamic coding. Finally, the results of the analyses of the adapted model are compared to the results from Wolff et al., and the capability of the adapted model to display dynamic coding, as well as its im-plications, are discussed.

1.1 The Pals et al. model

This study takes a model developed by Pals et al. (2020) as its starting point. The model is a large-scale, spiking-neuron model trying to explain activity-silent WM in the context of functional be-haviour.

1.1.1 Short-term synaptic plasticity

To achieve a functional model of activity-silent WM, Pals et al. implemented a mechanism known as short-term synaptic plasticity (STSP) (Zucker and Regehr, 2002). STSP refers to a phenomenon, shown by many neurons, in which their firing facil-itates (enhances) the connection between pre- and postsynaptic neurons for no more than a few min-utes (Zucker and Regehr, 2002). This essentially al-lows neuronal network activity to alter the network structure and have these alterations persist even

af-ter the activity has disappeared. These structural alterations can be considered as a method for hold-ing information, which is an important function for WM. Taking that into account, it is clear how STSP is one of the candidate mechanisms for activity-silent WM.

However, changes in the neuronal network struc-ture are often not measured in neuroimaging stud-ies; it is far more common to measure network ac-tivity. For this reason, the variables that determine the network structure (and the exact workings of the STSP-mechanism) are often called hidden vari-ables. Two hidden variables in particular are hy-pothesised to determine the presynaptic neuron’s predisposition to fire on a short timescale (Zucker and Regehr, 2002; Mongillo, Barak, and Tsodyks, 2008). One is the build-up of calcium ions at the presynaptic terminal: this build-up happens due to presynaptic firing and facilitates subsequent release of neurotransmitter. The second variable is the amount of neurotransmitter available for release at the presynaptic terminal (resource variable), which depletes due to presynaptic firing. While the cal-cium build-up declines over time (typically a sec-ond), the amount of available neurotransmitter in-creases. Effectively, the calcium is what enables the facilitation, and the resources variable, in turn, lim-its the facilitation (i.e. prevents unlimited calcium build-up). Together, they form a calcium-mediated version of the STSP mechanism described above. It is this calcium-mediated STSP mechanism that was implemented in the Pals et al. model.

1.1.2 The Wolff et al. study

This model was designed to use STSP to account for results from an EEG experiment conducted by Wolff et al. (2017). Wolff and colleagues developed a perturbation approach to measure the hidden states of activity-silent WM, i.e. they ‘pinged’ the brain with a non-specific stimulus so the neural WM networks could ‘echo’ the information their hidden states contained (not unlike sonar). To demonstrate this approach, they conducted three delayed response experiments, the first two of which are relevant for this study. In both of these two ex-periments, participants had to maintain randomly oriented gratings in memory (memory items) to be able to compare them to another grating presented at the end of the trial (test item), after some delay.

(3)

During this delay, the participants were ‘pinged’ with a vivid image called an impulse, which would presumably reactivate their activity-silent WM rep-resentations. EEG measurements were conducted on the participants during both experiments.

Using information decoding analyses on those EEG data, Wolff and colleagues found that the ori-entation of the memory item gratings could not only be decoded immediately after initial stimulus presentation (when the items were stored in WM), but also after impulse presentation, consistent with the idea that a ‘ping’ to WM would elicit an ‘echo’. All of this was taken as evidence that WM could save information in an activity-silent way.

In addition, Wolff et al. conducted cross-temporal decoding analyses (CTDAs) on their data recorded just after memory item presentation, one of which is depicted in Figure 1.1. The figure shows, for every time point ti, how well information from all other time points t0, ..., tn could be decoded by a decoder trained on ti. The plots are sym-metrical, so this interpretation is valid both in a row-wise and in a column-wise fashion. High val-ues that lie off-diagonal indicate that decoders can generalise cross-temporally to other time points, i.e. they can decode time points other than those they were trained on. This cross-temporal general-isation is an indication that information is repre-sented statically. In contrast, a CTDA with high decoding values that lie just along its diagonal in-dicate that a decoder trained on time step ticould not decode very well at time step ti+1. This ab-sence of cross-temporal decoder generalisation sug-gests a dynamic way in which information is rep-resented. The CTDAs created by Wolff and col-leagues showed such diagonals, and the aim of the present study is to present a large-scale, spiking-neuron model that can show the same CTDA pat-terns.

1.1.3 Structure and augmentations

Let us return to the Pals et al. model to consider its structure. The model consisted of two ules - one for each hemisphere - and every mod-ule contained a sensory, a memory, a comparison, and a decision population (see Figure 1.2). The grating images linked to the sensory population, which connected to the comparison population in two ways: One connection went directly from

sen-0

200 400 600 800 1000

TIme (ms)

0

200

400

600

800 1000

TIme (ms)

WM dynamics after stimulus presentation

-2e-3

-1e-3

0 1e-3

2e-3

Decodability (normalised

V)

Figure 1.1: A cross-temporal decoding analysis, con-structed from EEG data measured when a stimulus was presented and held in WM. The warmer (i.e. more red) the colours, the better the decoding. The clustering of high decoding values on the figure’s diagonal and nowhere else indicates little cross-temporal generalisa-tion and thus dynamic coding. Data from Wolff et al. (2017) (Experiment 1).

Stimulus Sensory Comparison Decision Memory

STSP

Figure 1.2: The model from Pals et al. (2020). Stimuli are transformed into a vector by the sensory popula-tion, which then sends it to the memory and compar-ison populations. For the first stimulus in a trial, the memory will hold on to its representations via recurrent STSP connections. During second stimulus presenta-tion, memory and sensory will respectively project the first and second stimulus to the comparison population, after which the decision population uses the informa-tion from the comparison populainforma-tion to determine how to act on the perceived stimuli difference.

(4)

Stimulus Sensory Comparison Decision Memory

STSP

Eye

Figure 1.3: The augmented model. An ‘Eye’ population has been added in between the stimulus and the sensory population. The eye population and the sensory population are connected by both a feed-forward connection with different synaptic delays (the stacked arrows pointing right) and a recurrent connection (the curved arrow pointing left). The neurons of the memory population are subject to noise through intermittent firing (which is indicated by dotted borders).

sory to comparison, while the other one went via the memory population first, and only then to the comparison population. Finally, the comparison population connected to the decision population. All of the aforementioned connections are feed-forward. In addition, a recurrent STSP connection was added to the memory population. This archi-tecture worked well to replicate the behavioural re-sults from Wolff et al. (2017), but it did not suf-fice to fully replicate the dynamic coding proper-ties found in human EEG data: The CTDA of the original Pals model showed a pattern that lasted just tens of milliseconds, while human data shows patterns that last hundreds of milliseconds (Figure 1.1). In addition, the pattern was that short that it was hard to determine whether it was an example of dynamic coding or simply static coding.

Therefore, in this study, the model has received three augmentations (see Figure 1.3). The expec-tation was that the length of the original model’s CTDA pattern would correlate with the amount of time the stimulus was represented in the model, so the augmentations mainly aimed to increase stimu-lus representation time. The augmentations are as follows:

• Distributed synaptic delays on a feed-forward connection from an added eye module. This causes the representation of the stimulus to not

reach the rest of the model in one piece, but distributed over a prolonged period of time. • Recurrent connections back from the sensory

population towards the eye population, in line with research indicating that lower-level vi-sual areas in the human brain also receive in-put from higher-level areas (Lamme and Roelf-sema, 2000). This was assumed to prolong stimulus representation even further.

• Background noise through intermittent firing in the memory population. Random spikes were expected to reactivate (parts of) the re-current STSP connections, thereby allowing the saved stimulus representation to be main-tained a little longer (albeit in an imperfect way).

For a more in-depth discussion of these augmen-tations, see Wijs (2020). An additional parameter was the strength of the synaptic delay filter applied to the spiking data of the memory population. This synaptic filter delayed all individual spikes and ef-fectively smeared them out over time, making the data resemble those as measured by EEG devices. The effects of the augmentations and the synaptic filter have been explored, and the resulting model has been fitted to match the Wolff et al. data.

(5)

T

ime

.

Memory items 250 ms

.

800 ms

<<

Cue 200 ms

.

900 ms

.

Impulse 100 ms

.

400 ms Probe 250 ms Figure 2.1 : T rial sc hematic for Exp erimen t 1. Tw o gratings, presen ted at the start, w ould ha v e to b e k ept in memory . One of them w ould need to b e comp ared to a grating at the end (the prob e). Only after memory item presen tation w ould participan ts receiv e a cue whic h item ough t to b e remem b ered and whic h one could b e forgotten. Bet w een the cue and the prob e, an impulse, a v iv id stim ulus supp osed to fun ction as a ‘ping ’ to WM , app e ared. These ev en ts w ere all pa dded with dela ys. The d uration of ev ery stage is giv en b elo w the depiction of the stage. The dots in the middl e of the stage depictions are fixation dots.

.

250 ms Memory items 950 ms

.

T ime Impulse 1

.

100 ms

.

500 ms Probe 1 250 ms

.

1750 ms Impulse 2

.

100 ms

.

400 ms Probe 2 250 ms Figure 2.2: T rial sc hematic for Exp erimen t 2. In this exp erimen t, participan ts had to main tain tw o gratings in memory , whic h w ould b oth b e tested. P articipan ts knew the order in whic h the memory items w ould b e tested. After memory item presen tation, a first impulse follo w ed, suc ceeded b y th e prob e for the memory item that w ould b e tested early . F ollo wing that, a second impulse w ould app ear with the prob e for the late-tested memory item in its w ak e. All these stages w ere padded with dela ys.

(6)

2 Methods

To test the augmented model for dynamic coding, it had to perform two tasks which it also performed in Pals et al. (2020). The tasks were adapted experi-ments from Wolff et al. (2017) and will be explained below. With the data gathered from the task exe-cution by the model, analyses were conducted to be able to construct CTDAs. These analyses will be explained after the experiment descriptions.

Interesting to note is that, before the full experi-ments were run, several parameter sweeps had been conducted to find the best set of the model param-eters mentioned in Section 1.1.3. Discussing those results falls outside the scope of this article, but those who are interested can find some of the sweep results in Appendix A.

2.1 Experiments

Two experiments from Wolff et al. (2017) were con-ducted with the augmented model, namely Experi-ments 1 and 2. In Experiment 1, for every trial, sub-jects had to watch two simultaneously presented, randomly oriented gratings and keep them both in memory. At the end of the trial, a different grating called ‘probe’, with a different orientation, would appear. One of the two memory gratings would have to be compared to the probe, but participants did not know which one of the two. After the two initial gratings had disappeared and a successive delay, a cue appeared that indicated which of the two items would actually need to be compared to the probe. After a second blank-screen delay pe-riod, a short impulse - a vivid image which ought to function as the WM ping - appeared, and af-ter a third delay, the probe (the final test grat-ing) appeared. Subjects had to indicate whether the probe was rotated clockwise or anti-clockwise with respect to the cued memory item. See Figure 2.1.

For Experiment 1, the model was run with 30 different randomisation seeds to simulate 30 dif-ferent participants. Then, with each ‘participant’, 1344 trials were conducted, just like Wolff and col-leagues did with their human participants.

Experiment 2 was similar, but now both mem-ory items would be tested; participants were told beforehand which item would be tested first (the ‘early’ item) and which one would be tested

sec-ond (the ‘late’ item). After memory item presenta-tion and a delay, an impulse and consequent delay followed. Then probe 1 appeared, which the early item had to be compared to. Another delay fol-lowed, with a second impulse. Finally, succeeding a last delay, the final probe appeared which the late item had to be compared to. See Figure 2.2.

For this experiment, 19 ‘participants’ were cre-ated which each performed 1728 trials. These num-bers are also the same as mentioned by Wolff and colleagues.

2.2 Decodability analysis

The orientation decoding method as outlined in Wolff et al. (2017) was adapted and applied to the output of the memory-module neurons of the model.∗ Wolff et al. ran separate analyses for the cued and uncued stimuli, so the same was done for the current study. For every experiment trial i, they first calculated all other trials’ initial stimulus angle relative to trial i’s initial stimulus angle. For exam-ple, when the angle of trial i is 40◦_{, and the angle} of one of the other trials is 39◦_{, the relative angle} becomes -1◦. They then binned all trials but trial i according to those relative angles. These bins’ cen-tres ranged from −π₂ up to but not including π₂, where every bin centre was π₆ apart and the bin width was π₆, meaning that the bins overlap (see Figure 2.3). Turning a stimulus π rad yields the exact same stimulus, meaning that an angle of 3π₄ rad is treated the same as an angle of −π₄ rad.

Then, for every time step of the EEG data, the Mahalanobis distances (MDs) between the EEG data of trial i and every bin’s EEG data were calculated. The MD is a distance measure for multivariate data that takes into account corre-lation in those data, and can therefore deal with non-spherical clusters (unlike the Euclidian dis-tance) (De Maesschalk, Jouan-Rimbaud, and Mas-sart, 2000). This is done by incorporating the in-verse variance-covariance matrix into the calcula-tion. The more similar the EEG data sets, the smaller the MD. Ideally, the EEG data recorded during the presentation of stimulus angles similar to trial i’s stimulus angle should be similar to trial i’s EEG data, meaning that the MD should be

∗_{For the code used in these analyses, visit https://}

(7)

- /2

- /3

- /6

0 /6

/3

/2

Stimulus angle (rad)

Angle bins

Bin centres

Figure 2.3: An illustration of the stimulus angle bins. The x-axis denotes the angle of the stimuli in radians and the vertical, dashed lines indicate the bin centres. The width of every bin is visualised by a horizontal line on the bin centre that has the same colour as the bin centre line.

smallest for the 0 rad bin and largest for the ±π₂ bins, resulting in an MD curve as shown in Figure 2.4.

As can be seen in the figure, the MDs expected in ideal decoding conditions form some sort of si-nusoidal - more specifically, a stretched (along the x-axis) and inverted cosine, translated above the line x = 0. In other words, re-stretching, mean-centring and inverting the MD curve should yield a cosine when the stimulus is presented and repre-sented in the participant’s brain. The stretching is done by multiplying all stimulus angles by 2; the same stretch is applied to the bin width (which changes it from π₆ to π₃). The transformations are illustrated in Figure 2.5. When transformed, the MD curve can be convolved with an actual cosine to get a measure of angle decodability: The bet-ter the stimulus angle is represented in the brain, the more it will be pronounced in the EEG data, the more the MD curve will look like a transformed cosine and the more the convolution of the trans-formed MD curve with an actual cosine will result in a high value. This value is called the cosine sim-ilarity.

As this cosine similarity has been calculated for every time step for trial i, a cosine similarity curve (also decodability curve) will be created for trial i which shows at which point in the trial the an-gle could be properly decoded from the EEG data. Such decodability curves are calculated for every trial and used to calculate a mean decodability curve that shows how well an angle could be de-coded at which time point for all trials.

- /2

- /3

- /6

0 /6

/3

/2

Stimulus angle (rad)

0 Mahalanobis distance (V)

Ideal Mahalanobis distance over bins

Bin centres

MD

Figure 2.4: Ideal form of Mahalanobis distance across stimulus angle bins from the stimulus angle of trial i. The dashed lines denote the bin centres, while the dots indicate the MD one would expect from the angle bin centres they are placed upon. An infinitesimal number of bins would result in the sinusoidal line drawn through the dots.

0 Transformations of the MD curve

_Stretched

Bin centres

MD

0 +

Mahalanobis distance (V)

Mean-centred

-

- /2

0 /2

Stimulus angle (rad)

0 +

Inverted

Figure 2.5: The ideal MD curve across bins first stretched along the x-axis, then mean-centred and fi-nally inverted. Note that the domain is now [−π, π)

instead of [−π

2, π

2) due to the stretching. Dashed lines

(8)

The procedure described above was applied to data from the model. The data consist of several trials of filtered spiking activity of individual neu-rons from the memory population, for every time step. There were 1500 neurons in the memory pop-ulation. Experiment 1 lasted 3 seconds and Exper-iment 2 lasted 4.6 seconds, so with a resolution of 2 milliseconds, that comes down to 1500 time steps for Experiment 1 and 2300 time steps for Experi-ment 2. To mimic EEG measureExperi-ments, where the electrical behaviour of groups of cortical neurons is recorded, but also to decrease computational com-plexity, the K-means algorithm was used to group neurons together into 17 groups (convergence de-clared when inertia was below 10−20, best of 20 runs). The number 17 was chosen because Wolff et al. used 17 electrodes in their EEG measure-ments. After clustering, means were calculated for all neurons in a group, resulting in a data structure of [trials] by 17 by [time steps]. Then, noise from a normal distribution (µ = 0, σ = 0.5) was added to all of the data, as the model neurons are com-pletely silent when no input is presented (i.e. their output is 0). Completely silent neurons would re-sult in both computational errors (division by zero) and a higher than intended decodability (because all neurons look alike when they are silent). Along with the modified neuron data, the angles of the initially presented stimulus were used in the de-codability analysis. The bins and their width were kept the same as in Wolff et al..

2.3 Cross-temporal decodability

a-nalysis

A cross-temporal analysis (which is an example of multivariate pattern analysis; King and Dehaene, 2014) was conducted to determine the amount of dynamic coding present in the model data. To un-derstand how the analysis works, consider Figure 2.6, which illustrates the process for a single exper-iment trial i. The main idea is that, for a stimulus angle bin b, and then for every time step tx, the bin data at txare compared to the data of trial i at all other n time steps t0, ..., tn. ‘Compared’ in this sense means calculating the MD between the data of b and the data of trial i. This procedure yields n2 _{MD values for one bin.}

Once the procedure has completed for all bins, there basically are n2 _{MD curves, which can}

un-Trial i data Bin b data MD Time steps t0 t0 t1 t2 t3 Trial i data Bin b data MD t1 Time steps T ime steps Bins t0 t1 t2 t3 t0 t1 t2 t3 t0 t1 t2 t3

Figure 2.6: Illustration of the cross-temporal decoding process. For every stimulus angle bin b, and then for

every time step tx, the MD is calculated between the

EEG state of trial i (17 channel values) at all time steps

t0, ..., tnon the one hand and the bin mean at txon the

other hand. This results in an n-by-n grid of MD values for b. Repeating for multiple bins gives multiple grids.

dergo the same set of transformations and convolu-tion as explained in Secconvolu-tion 2.2. This in turn results in n2_{decodability scores, which can be arranged in}

(9)

an n-by-n grid. These grids are then averaged to-gether for all trials to get a cross-temporal decod-ing matrix. A diagonal with high decoddecod-ing values and low values everywhere else in such matrices is a hallmark of dynamic coding: It indicates that at every time step tx, there can only be proper decod-ing in the context of that same time step tx (i.e. with respect to the bins constructed from the data at that moment tx), but not in the context of any other time step (e.g. tx+ 1).

The preparations of the model data were nearly the same as for the regular decoding analysis: The model neurons were combined into 17 clusters through the K-means clustering algorithm and av-eraged together for every cluster, after which noise was added. The only adaptation was the splitting of the set of trials of each model participant in two equally sized halves in an effort to further decrease computational complexity; CTDAs were conducted on both halves separately, and the results were av-eraged together.

3 Results

After conducting both experiments and collecting the corresponding model data, the analyses as out-lined in Sections 2.2 and 2.3 were applied. This sec-tion will present the results of those analyses, while also considering the results from the model and the human data, gathered by Wolff and colleagues, together, as to ascertain how well the model ap-proaches the human results. In all of the figures that follow, the term ‘Data’ is used to refer to the human data and results, while the term ‘Model’ naturally refers to the results of the augmented model.

In order to be able to compare the Data and Model results, however, some conversion has to be done first. The Model decodability scores are namely a factor of about 10 higher than the Data decodability scores. To bring all scores to the same scale, z-scores have been calculated for every in-dividual analysis. (Means and standard deviations are recalculated for every analysis instance, unless stated otherwise.)

3.1 Experiment 1

As for Experiment 1, Figure 3.1 shows the Data and Model z-scores for both the regular (A and C,

respectively) and cross-temporal (B and D, respec-tively) decodability of both memory items com-bined, during and after their presentation. The curves in A and C are relatively similar, with a steep ascent and a (relatively) smooth decay. How-ever, it has to be mentioned that while the Model curve remains high for roughly the stimulus du-ration (250 ms) before decaying, the Data curve starts decaying rather quickly after reaching its peak. Moreover, the Data curve shows a second peak, which the Model does not.

Considering the CTDAs in B and D, some more differences can be seen. The cross-temporal Data results show a very strong diagonal that almost continues up to the 750th millisecond. The Model CTDA, however, shows a diagonal that seems slightly thicker, shorter and less strong compared to its own surrounding decodabilities. More no-tably, the Model CTDA shows vertical and horizon-tal strokes of decodability (referred to as ‘arms’), which start around 200 ms and continue for as long as the diagonal continues. The Data CTDA, how-ever, shows no arms. Another difference is the ap-pearance of periodic spots of slightly higher decod-ability in the Data CTDA, which are absent in the Model CTDA. Having said that, the Model CTDA definitely seems to show a diagonal that has at least roughly the same length as the one in the Data CTDA.

To show that the augmented model can still maintain its original function (being a model of activity-silent WM), Figure 3.2 considers the de-codability of both the cued and uncued memory items after impulse presentation. Recall that the impulse is a vivid stimulus which was supposed to work as a ‘ping’ to WM and re-elicit silent WM item representations. As can be seen in the Model part of the figure (right pane), after impulse pre-sentation (grey bar), both cued and uncued mem-ory items could be decoded from the model data, although the cued item could be decoded slightly better than the uncued one. In contrast, the left pane of the figure shows that only the cued item could be decoded from the Data; according to cal-culations by Wolff et al. (2017), the uncued decod-ability was not significantly different from 0. This reveals an error in the augmented model: The un-cued item can be decoded from its data, while that should not be the case.

(10)

Overall decodability memory items Experiment 1 A C Model Data B D

Figure 3.1: The overall decodability of the combined (cued and uncued) memory items, just after their presenta-tion. The grey bars indicate memory item presentapresenta-tion. Data: The results of human data from Wolff et al. (2017). Model: The results of the data from the augmented model. All decodability scores are z-scores. A Decodability curve of the memory items from the Data. B The CTDA of the memory items from the Data. C Decodability curve of the memory items from the Model. D The CTDA of the memory items from the Model.

Decodability memory items after impulse Experiment 1

Figure 3.2: The separate decodability curves of the cued and uncued memory items after the impulse in Experiment 1. Time is shown relative to the impulse onset, and grey bars indicate impulse presentation. The left pane shows the curves for the Data (whose z-scores make use of the same mean and standard deviation), while the right pane shows the curves for the Model (which also share their mean and standard deviation).

(11)

Figure 3.3: The Data (left) and Model (right) performance. The x-axis indicates how much the cued memory item and probe differ in degrees, while the y-axis reports the proportion of clockwise responses given for the angular difference on the x-axis.

discussed, the focus can now be shifted towards the performance of the model; after all, the augmented model should be functional. More specifically, as it models parts of human cognition, it should ide-ally be as good as (and not worse or better than) humans. Figure 3.3 shows the Data performance next to the Model performance. Absolutely per-fect task performance (which would not be human) would give ‘curves’ that are a flat 0 (no clockwise responses given) for negative angular differences, rise straight up at an angular difference of 0 de-grees and are a flat 1 (only clockwise responses given) for positive angular differences. The Data and Model curves are rather similar, with both sub-figures showing an S-shaped curve. This indicates the augmented model is still a good approxima-tion for human behaviour, just like the original Pals et al. (2020) model.

3.2 Experiment 2

Considering Experiment 2, Figure 3.4 sheds some more light on the dynamics of the Model with

re-spect to the Data. The figure has the following build-up: A and C denote, respectively, the Data and Model decodability curves for the early- and late-tested items. Meanwhile, B and D, associated with the Data and Model results, respectively, each show two separate CTDAs; the left one in each sub-figure depicts the dynamics of the early-tested item, while the right one gives the CTDA of the late-tested item.

Comparing to the results from Experiment 1, a number of differences, but also similarities, can be seen. For example, the Data curves in A do not show a clear second peak such as the one seen in Figure 3.1A, but are otherwise rather similar to it, with a steep ascent and a slow decay. However, the CTDAs in B do differ from the CTDA in Figure 3.1B, with the tested-early CTDA (left pane) show-ing a thicker diagonal, while the tested-late CTDA (right pane) shows a thinner one. The Model curves in C, on the other hand, are nearly identical to the curve in Figure 3.1C. The same can be said for the Model item representation dynamics in D, which both look very similar to the dynamics showed in

(12)

Dec

odabilit

y memor

y it

ems Exp

erimen

t 2

A

M

odel

C

Da

ta

B

D

Figure 3.4: The deco dabilit y of the early-and late-tested item, just after their presen tation. The top ro w sho ws the Data results, the b ottom ro w sho ws the Mo del results (after W olff e t al., 2017). Eac h sub-figure (A, B, C and D) has a separate shared mean and standard deviation used in the calculation of the z-scores. A The deco dabilit y curv es for the early-and late-tested Data memory items after their presen tation. B The CTD As corresp onding to A . C The deco dabilit y curv es for the early-and late-tested Mo del memory items after their pre sen tation. D The CTD As corresp onding to C .

(13)

Decodability memory items after impulses Experiment 2

Model

Data

Figure 3.5: The decodability curves of the early- and late-tested items after the impulses of Experiment 2. The top row shows the Data curves (after Wolff et al. (2017)), the bottom row shows the Model curves. The left column shows the curves around the first impulse, the right column shows them around the second impulse. Time is relative to the corresponding impulse, and the grey bars indicate either the first impulse (left column) or the second one (right column). The Data z-scores share the mean and standard deviation used in their z-score calculation, as do the Model z-scores.

(14)

Figure 3.1D.

Comparing the different results within Figure 3.4 to each other, more insights can be gained. For in-stance, there is not much of a difference between the decodability curves (C) and between the dy-namics (D) of the early- and late-tested items for the Model. For the Data, however, there seem to be substantial differences. A suggests that the early-tested item was decoded much better than the late-tested item. In B, the differences seem even more extreme: While the tested late dynamics show a diagonal that resembles the diagonal shown in Fig-ure 3.1B (albeit a bit shorter), the tested early dy-namics show a diagonal that is stronger, contin-ues for a longer amount of time and seems to ‘fan out’ as time progresses. When looking closely at the start of the diagonal (around the 200th millisec-ond), two little arms can be seen, which are very clearly present in the Model dynamics.

Taking this into account, in some respect both the early- and late-tested Model dynamics are more similar to the late-tested Data dynamics, as neither shows a fanning out of the diagonals and the diag-onals are of roughly the same length. However, in a different sense the Model dynamics are more sim-ilar to the early-tested Data dynamics, as they all show some sort of arms. Additionally, the thickness of the Model diagonals seem to match the early-tested Data diagonal better.

Looking further, especially at the decodability of the items after the impulses (Figure 3.5), a sim-ilar picture arises. The Data curves show a clear difference, with the early-tested item being more strongly represented after the first impulse, while after the second impulse, the late-tested item is more prominent. The Model curves show markedly less difference after each impulse.

Finally, the performance on Experiment 2 has to be considered. Considering Figure 3.6, it seems that the Model and Data performance are very similar to each other for the early-tested item, which would mean the Model is still a good behavioural approx-imation for the Data. However, concerning the per-formance for the late-tested item, the Model in its augmented form seems to perform quite worse than the Data, in contrast with the original model. Evi-dently, the late-tested item is not as well-preserved in the Model as it should be.

4 Discussion

A model created by Pals et al. (2020), originally designed to be a functional implementation of activity-silent WM and able to execute tasks from Wolff et al. (2017), was augmented to display dy-namic coding patterns visible in human EEG data. The experiments from Wolff and colleagues were conducted again with the augmented model, and the resulting data were fed into decodability anal-yses. The results were compared to the human ones. This section will further discuss the similari-ties and differences between the model and human data, suggest points of improvement and further research, and finish by drawing conclusions on the overall fitness of the model, as well as on its impli-cations for dynamic coding.

4.1 Decodability

In Section 3.1 and 3.2, all Model decodability curves are very similar with a steep ascent and a graceful decay, while the Data show some more variability in its decodability curves. One example of this is the clear second peak seen in the Data curve after memory item presentation in Experi-ment 1, which is not present in the Model curve. This might reflect some other processes being active at the same time in every trial in the participants’ brains that are absent in the model (e.g. atten-tional shifts, feedback from higher-order stimulus-processing areas to lower-order areas et cetera), which in some way reinstate the stimulus represen-tation.

As for Experiment 2, there seems to be a clear difference between the early and late Data decod-ability curves, while the Model curves are almost completely identical. A possible explanation for this phenomenon might be attention: Participants knew in advance which item would be tested first, so it is not unreasonable to assume that they might pay more attention to that item. Attention is known for strengthening neural responses to attended stimuli (Kastner and Ungerleider, 2000), which would sug-gest that it might also strengthen the WM repre-sentation of the early-tested item, explaining the difference in the maxima of the Data curves. The model does not implement attentional control. In-stead, attentional effects were simulated by reduc-ing the late-tested item strength to 90% of the

(15)

early-tested input, which might not be enough to create a differentiation between the early and late Model decodability curves.

4.2 CTDAs

The CTDAs show a picture that match the one shown by all the decodability curves. All Model CT-DAs look alike, while the Data CTCT-DAs show some interesting differences; not only among themselves, but also when compared to the Model CTDAs. Dif-ferences that already become clear from Experi-ment 1 are the periodic spots that appear in the Data CTDA but not in the Model one. They could be the result of some other, consistently recurring brain processes, just like such processes could cause the second peak seen in the Data decodability curve of Experiment 1. Another difference between the Data and Model CTDA is the absence of the arms in the Data CTDA. Perhaps this is related to the clean and stronger representation of memory items in the Model, which are undisturbed by any other processes, while in the human brain, these repre-sentations might have to contend with others. This view would fit in with the fact that the decodabil-ity curve of the Model has a higher maximum than the Data curve.

Considering Experiment 2, the differences be-tween the dynamics of the early- and late-tested memory items are more pronounced for the Data than for the Model, corresponding to character-istics of the respective decodability curves. More specifically, for the Data, the early-tested diago-nal is thicker than the late-tested one, and also ‘fans’ out into some semi-static pattern as time progresses. In addition, it has two small arms at roughly the same location the Model CTDAs have them, although the Data arms do not last quite as long as the Model ones. The variability in both the arms and the thickness of the diagonal can again be explained with the decodability curves: The higher the curve’s maximum, the more pro-nounced the arms and the thicker the diagonal. This suggests that the strength of the item rep-resentation in WM has an effect on its dynamics as displayed in a CTDA, which in turn relates to the presumed effects of attention.

However, stimulus strength alone does not ex-plain the high-decodability fan visible in the early-tested Data dynamics. An interesting note is that

such fans also appear in the model parameter sweep, specifically for model versions with strong recurrent connections (Appendix A). A possible reason for the fan might then be the rehearsive as-pects often associated with attention: Reactivation of the neurons responsible for the item representa-tion would lead to an extended diagonal that even-tually becomes static.

4.3 Impulse response

The responses to the impulse show a rather critical point of improvement for the model, especially in Experiment 1. Whereas the uncued Data decodabil-ity curve remains close to its minimum, the uncued Model decodability curve becomes nearly as high as the cued one. Clearly the model holds on too well to the uncued item, perhaps due to all the aug-mentations that were intended to prolong the item activation in the first place. The response to the second impulse from Experiment 2 suggests such a deficit as well: Although the early-tested item is no longer needed, its decodability is not much lower than that of the late-tested item. Combined with the fact that Wolff et al. (2017) showed that the late-tested item curve was significantly different from 0 and the early-tested item was not, this also indicates an item maintenance that works too well. Adding more noise to the model, in order to dis-tort item representations more quickly, seems like a potential solution.

The fact that the Model impulse responses to the first impulse of Experiment 2 are nearly identical, is to be expected: Neither item has been reactivated more than the other at the time the first impulse arrives. Both items also start out with similarly strong representations after stimulus presentation (Figure 3.4C), presumably due to the lack of at-tentional control.

4.4 Performance

The Model performance for Experiment 1 and the early-tested item of Experiment 2 is roughly the same as the Data performance, which is good. How-ever, the Model seems to perform quite worse on the second probe of Experiment 2, for the late-tested item. So while the uncued and early-tested item seem to be preserved too well in the model, the late-tested item seems to be preserved too poorly, which

(16)

would appear rather contradictory at first sight. However, attention might explain this contrast as well: Wolff and colleagues observed a strong later-alisation in their EEG data after the early-tested item was probed, and suspected that this lateral-isation might reflect a shift in attention towards the initially deprioritised, late-tested item. Adding more noise to the Model to allow it to quickly ‘for-get’ unnecessary items, while also adding some form of attentional control that would allow it to rein-state WM items that become important only later on, would then seem the right approach for a more human-like model.

4.5 Dynamic coding

Overall, the model data show many similarities to the human data, while also showing some im-portant differences. To combat these differences, it would seem that the addition of attentional control paired with a bit more noise added to the memory population might suffice. Most crucially, however, the model displays clear diagonals in its CTDAs, which are hallmark patterns often cited in the lit-erature as evidence for dynamic coding. Namely, as mentioned before, such diagonals indicate that a decoder trained on data at some time step t can decode data at that same time step t very well, but data at a more distant time step, say t + 1, rather poorly, which suggests a dynamic item representa-tion. The fact that the model displays such patterns has a number of implications.

One of the first is that the calcium-mediated STSP mechanism as proposed by Mongillo et al. (2008) and implemented here by Pals et al. (2020) is a good basis mechanism for displaying dynamic coding. The original model by Pals and colleagues also needed some extensions, however: The pattern it displayed in a CTDA was simply too short to be classified as either dynamic or static. As the present study has shown, prolonging the WM item repre-sentations was the key to seeing dynamic patterns appear in the model data. Having said that, pro-longing the representations would seem more of a practical necessity for being able to properly see the patterns, rather than a fundamental property that a dynamically coding network should have. In contrast, the self-modifying nature of networks that implement an STSP mechanism might be what en-ables dynamic coding. After all, if a network has

changed its structure due to previous activation, it is unlikely it will represent the some input it re-ceived before in exactly the same way as it did then. Then what does this mean for the dynamic coding framework as proposed by Stokes (2015)? Stokes sees dynamic coding as the result of a com-plex reciprocal interaction between some network’s activity states and its hidden states, which would result in a complex trajectory through the activ-ity state space of a neuron or a complete neu-ronal population. In some sense, this view seems rather fitting: The augmented model uses a hidden state, namely the calcium and resource variables, and that state is modified through network activ-ity. This in turn allows for dynamic coding, and, if so desired, the accompanying trajectories through activity state space by the memory population, or even its individual neurons, can very well be called complex.

However, it is questionable whether these terms are appropriate for what dynamic coding might ac-tually be: A property of any self-modifying acti-vation network. Dynamic coding might be simpler than the ‘big terms’ that Stokes uses would make the reader suspect. For instance, “temporal vari-ability at the very shortest timescales”, as Stokes mentions, turned out to be hardly necessary for dynamic coding: Although rapidly changing neuro-physiological parameters might have some influence on WM dynamics in the human brain, a resolution of two milliseconds proved enough for the model to mimic those dynamics. The underlying conceptual framework that Stokes proposes seems appropriate, but it might be best to appraise it in a more mod-est way, foregoing a focus on terms like “a complex spatiotemporal trajectory through state space”.

In contrast, a more interesting question that still remains unanswered is how a network can maintain the same information through time even though the information representation is dynamic. That ques-tion has not been addressed by this study, but judg-ing by the fact that the augmented model can still perform its task while also displaying dynamic cod-ing, the model is capable of this information main-tenance even though the information representa-tion is dynamic. This makes the current model a promising tool for future research on dynamic cod-ing and its importance for human WM and broader brain functioning.

(17)

References

A. Baddeley. Working memory. Science, 255(5044): 556–559, 1992.

C.E. Curtis and M. D’Esposito. Persistent activ-ity in the prefrontalcortex during working mem-ory. Trends in Cognitive Sciences, 7(9):415–423, 2003.

M. Daneman and P.A. Carpenter. Individual differ-ences in working memory and reading. Journal of Verbal Learning and Verbal Behavior, 19(4): 450–466, 1980.

R. De Maesschalk, D. Jouan-Rimbaud, and D.L. Massart. The mahalanobis distance. Chemomet-rics and Intelligent Laboratory Systems, 50(1):1– 18, 2000.

C. Eliasmith. How to build a brain: A neural archi-tecture for biological cognition. New York, NY: Oxford University Press, 2013.

S. Kastner and L.G. Ungerleider. Mechanisms of visual attention in the human cortex. Annual Review Neuroscience, 23(1):315–341, 2000. J.R. King and S. Dehaene. Characterizing the

dy-namics of mental representations: the temporal generalization method. Trends in Cognitive Sci-ences, 18(4):203–210, 2014.

V.A.F. Lamme and P.R. Roelfsema. The distinct modes of vision offered by feedforward and recur-rent processing. Trends in Neuroscience, 23(11): 571–579, 2000.

G. Mongillo, O. Barak, and M. Tsodyks. Synaptic theory of working memory. Science, 319(5869): 1543–1546, 2008.

M. Pals, T.C. Stewart, E.G. Aky¨urek, and J.P.

Borst. A functional spiking-neuron model of

activity-silent working memory in humans based on calcium-mediated short-term synaptic

plas-ticity. PLoS Computational Biology, 16(6):

e1007936, 2020.

K.K. Sreenivasan, C.E. Curtis, and M. D’Esposito. Revisiting the role of persistent neural activity during working memory. Trends in Cognitive Sci-ences, 18(2):82–89, 2014.

M.G. Stokes. ‘Activity-silent’ working memory

in prefrontal cortex: a dynamic coding frame-work. Trends in Cognitive Sciences, 19(7):394– 405, 2015.

K. Watanabe and S. Funahashi. Neural

mecha-nisms of dual-task interference and cognitive ca-pacity limitation in the prefrontal cortex. Nature Neuroscience, 17(4):601–611, 2014.

C. Wijs. Dynamic coding in a neural model of

activity-silent working memory. Unpublished

manuscript, 2020.

M.J. Wolff, J. Jochim, E.G. Aky¨urek, and M.G.

Stokes. Dynamic hidden states underlying

working-memory-guided behavior. Nature Neu-roscience, 20(6):864–871, 2017.

R.S. Zucker and W.G. Regehr. Short-term synaptic plasticity. Annual Review of Physiology, 64(1): 355–405, 2002.

(18)

A

Sweep results

The resulting dynamics of a parameter sweep over three parameters, namely (1) the strength of the feed-forward connection with distributed latency from the eye to the sensory population, (2) the strength of the recurrent connection from the sensory to the eye population and (3) the standard deviation of the noise added to the memory population. Every column indicates a particular combination of feed-forward connection strength and recurrent connection strength. Each row indicates a specific standard deviation of the noise. A wide range of patterns becomes available, including the fanning out of the diagonal seen in the human data of Experiment 2 from Wolff et al. (2017).

Feed-forward strength x

Recurrent strength

Noise standard deviation

0.004 0.008 0.012 0.016 0.030