On the Degree of Learning-Difficulty - Causal pattern inference from neural spike train data

The assessment framework can be used to test the performance of a technique over a wide range of parameters, those of the neural simulation, for example. Learned networks might turn out to be satisfactory for some parameter combinations but not for others. If the neural simulation was realistic and at the same time parameters of a real system under study were known, one could test whether the technique delivers good networks in practical application or not. Unfortunately, both conditions will not be met most of the time: Computa- tional constraints can impede neural network simulations on a realistic scale (Section A.1), and very few biological systems are known well enough in order to correctly parameterise detailed models of them. On the other hand, in much simplified models parameters cannot be related to the studied system; this is the case for the neuron models that will be used in this thesis. In order to relate simulation results to the biological system, an abstract measure to characterise the data is proposed, which assesses the difficulty of network inference. It is thereby possible to roughly estimate the expected performance of the SSS on real spike trains.

Inference from data is a common task whose difficulty can vary within a wide range. How complicated it is to draw conclusions from data depends on many factors, such as the particular conclusion and the amount of information about it in the data. Specific to the problem of network inference from spike trains this means: Learning a good network mainly depends on the complexity of the underlying system, the informativeness of the spike train data, and the budget of computing time. Hence, network inference from a simple two-neuron simulation poses a much easier problem than a large scale network simulation with realistic observability and noise levels. But what exactly determines the severity of the network learning task? How can the identified relevant factors be quantified? And to what extend does the performance of network inference depend on these? Two factors concerning these questions play a major role within the scope of the assessment framework: the complexity of the simulated system and data

quality with respect to information about network connectivity. Subsequent sections discuss these two aspects and measures are proposed, such that the dependency of learning performance on relevant factors can be quantified.

6.2.1 Complexity of the Simulation

The complexity of the simulated neural system depends on its components: the number of neurons, their connectivity, and the dynamics of each individual neuron. Neurons seem to exhibit a rather limited behavioural repertoire compared to the complex behaviour of higher vertebrates. Some models simplify neural dynamics even further down to a few key aspects. For example, leaky integrate and fire models can replicate certain observations of spiking neurons; however, they are unable to describe sub-threshold dynamics for which conductance based models are needed (Section A.1). The latter kind may thus show more complex behaviour, which is important when connecting several neurons to neural networks: The more complex the building blocks of the network are the more complex its overall dynamics can be. This means that increased model complexity can make interactions between elements appear to be more probabilistic than with simplistic neuron models. Because it is generally harder to detect probabilistic relations than deterministic ones, this of course affects network inference. The type of neuron model used to simulate neural networks can thus have an impact on network learning performance.

The brain is a good example of how network size influences system complexity: A single neuron seems to be capable of few computational operations only, but combining many such relatively simple units can yield a highly complex system with a huge variety of abilities. However, sheer network size (i.e. number of neurons and links between them) does not explain this arising complexity, but the actual connectivity patterns must also be taken into account. Such patterns, also calledmotifs, have been shown to be able to influence the dynamics of a neural system [Sporns et al., 2000, Sporns and Tononi, 2002, Galan, 2008, Bullmore and Sporns, 2009]. Reliably quantifying motifs with existing measures (see e.g. Strogatz [2001], Albert and Barabasi [2002], Sporns [2003], Costa et al. [2007]) commonly requires very large networks (> 1,000 nodes). These cannot be applied to the relatively small networks used in simulations within this framework; networks are therefore characterised in an ad hoc manner. For example, in a strict feed-forward network without any recurrent loops, activity is limited to flow into a predefined direction. Comparatively, a network consist- ing of interconnected clusters of nodes with recurrent connections within each cluster can exhibit a larger variety of fundamental states (e.g. combinations of neuron-groups that are active simultaneously) or different orders of sequential

cluster activation. The cluster-network can thus generate data with a larger variety in dynamics than the feed-forward network and it would be expected that network inference from these more complex data is a harder problem than for the feed-forward dynamics. The topology of the golden network can thus have an impact on the the performance of network learning.

6.2.2 Data Informativeness

Studying a system involves collection of corresponding data. Using these data for inference about the system can vary in difficulty, which depends on how much information the data conveys about the system: First of all, the data must be relevant with respect to aspects of interest. And additionally, quality and quantity of data affect how much information it can maximally convey. For example, a noise free data set with high temporal resolution can be more informative than one with high noise levels collected at a low sampling rate [Nyquist, 1928, Shannon, 1949]. Likewise, large data sets are more likely to contain several observations of a particular effect than just a few data-points in which the effect might only be observed once. Observing the effect repeatedly can strengthen belief in its existence by weakening the alternative of having simply observed an artefact.

Factors like data-length or noise level are easily controlled in a simulation environment; however, controlling the relevance of the data may not be straight- forward. With respect to the assessment framework, it might seem unexpected why a spike train generated by a simulated neural network should be controlled and quantified with respect to its relevance; but recalling that causal interactions shall be inferred from the data shows why this makes sense: Consider that the neural simulation of the golden network was parameterised such that post-synaptic potentials induced by connected neurons would never suffice in order to evoke a spike in the receiving neuron. All model neurons were spon- taneously active, but no other than these random spikes would be observed. Hence, spike trains would convey information about rates of spontaneous activity of the neurons, but this information is irrelevant with respect to causal interactions between modelled units. The lack of information about network connectivity in random spike trains makes them useless for network inference, and expecting that sensible relations could be recovered by any method is un- reasonable. On the other hand, if the data contains sufficient indications about the interplay of units, these interactions should be expected to be revealed by network inference. In order to ensure that generated spike trains are enriched with relevant information about their underlying network, the simulation must be parametrised such that post-synaptic potentials are sufficiently excitatory.

Excitations can then initiate spikes that convey information about network connectivity through their temporal correlation.1

The difficulty of network inference varies depending on how distinct relations between units are reflected in the data. For a fair assessment of an analysis technique, the severity of the problems it is applied to needs to be taken into account. Therefore, the spike train data are quantified with respect to their relevant information concerning the neural connectivity.

Quantifying the Informative Value of Spike Trains

The neural simulation of the golden network yields spike trains, which shall be characterised with respect to their informativeness about the networks connectivity. Parameters of the neural simulation (spontaneous activity level, synaptic efficiency) control the ratio of uncorrelated spontaneous spikes and those which are evoked by post synaptic potentials. As explained earlier, this mixture of uncorrelated and correlated spikes determines the amount of information con- veyed about the network. In order to quantify the degree of informativeness, each spike train is characterised by the proportion of spontaneous spikes and evoked ones in

Definition 5 (Impetus) The impetus of a spike train is the relative increase

in the number of spikes evoked by post-synaptic potentials to the number of spontaneous spikes:

impetus= 100· #evoked spikes

#spontaneous spikes % . (6.1)

If, for example, impetus=0% then no spikes are evoked at all and the spike train only consists of spontaneous spikes, i.e. uncorrelated random spikes. For

impetus=100% the spike train is a mixture of two halves: spontaneous spikes

and evoked spikes. Thus, when the impetus is low the data is similar to the uncorrelated spike trains, representing inherent spontaneous activity of model neurons. A higher impetus indicates a more autonomous system with higher self-dynamics. In such systems, the spike trains are more informative about network connectivity, which is expected to improve network recovery.

Unfortunately, the impetus cannot be calculated for real data.2 _{If the im-} petus was known, the quality of networks learned from recorded data could be

1_{Inhibition can also result in informative correlations, by impeding spontaneous activity,}

for example.

2_{Franziska Matth¨}_{aus and William Heitler noted that certain experimental set-ups might}

facilitate the determination of the impetus: If chemical synaptic transmission is pharma- ceutically blocked, observed activity corresponds to intrinsic spiking (neglecting influences of electrical synapses); comparing this base-line activity to recordings where synaptic transmission is unblocked can yield a reasonable estimate of the system’s impetus.

appraised by results of the simulation with a similar impetus. However, the impetus may be basic enough in order to enable experimenters to roughly estimate the impetus for their data. For example, recordings within a feed-forward type structure are expected to exhibit a higher impetus than from an area with many converging external inputs.

Measuring the impetus of a spike train gives an indication for the informativeness of the data about network connectivity, which correlates with the severity of the network inference task. The initial position for learning networks can thus be rated; how it can be related to the quality of learned networks is discussed next.

In document Causal pattern inference from neural spike train data (Page 97-101)