2. Overview
3.6 Analysing Neural Network Recordings
The analysis of multiple simultaneously recorded spike trains has as its goal the identification of the recorded networks functional connectivity between neurons and the mapping of the network. This information will then allow researchers to formulate and refine computational models describing the operation of the recorded network.
This project presents three visualisations aimed at allowing the researcher to explore a set of simultaneously recorded spike trains with the goal of identifying connectivity between neurons. These are:
1. The spike train raster chart (iRaster),
2. The pairwise cross correlation grid (iGrid) and 3. The MEA firing animation (iAnimate)
Each of these will be discussed in the relevant implementation chapter but in summary they are used as follows:
Page | 52
3.6.1 iRaster – The classic spike train raster plot.
The iRaster visualisation presents the raw data being analysed as a collection of spike trains plotted as a time series of discrete spiking events. Exploring the dataset in this view is primarily a re-ordering and filtering task. The interactive plot will provide a set of built in analysis algorithms to order / reorder spike trains (primarily sorting on inter-spike intervals and bursts of spiking activity). The VISA analysis pipeline will provide a means to introduce the researchers own custom developed ordering algorithms. Finally the raster plot will be able to group and filter individual spike trains. It provides a means to inspect visually the raw data and to identify visually the recurring patterns that indicate potential connectivity.
3.6.2 iGrid – The Cross Correlation grid
Based on the work of Stuart, Walter and Borisyuk the cross-correlation grid is a visualisation from which it is possible to determine the neural networks connectivity (Stuart, Walter & Borisyuk, 2005). This is a computationally intensive visualisation which while very effective does not scale well to large data sets. The computational load grows exponentially as the number of recorded neuron spike trains increases. The visual grid representation also becomes quickly un-usable as neuron counts rise. This project will use parallel computation to increase access to compute power and provide a pre-processing algorithm to handle the computational load. This algorithm will scale effectively from a researchers laptop to a high performance compute cluster (HPC) without code modifications. To prevent cognitive overload on the part of the user a clustering algorithm will identify connected neurons. The clusters will be used to create a dendrogram allowing user navigation of the data set by neuron cluster.
3.6.3 iAnimate – The Multi Electrode Array firing animation
The physical spacial relationship between firing neurons can also be used to visually identify clusters of neurons. By plotting firing of neurons over time on a 2D plane representing the recording multi electrode array clusters can be visually identified. This allows further analysis, such as the cross-correlation grid to target these neurons. Where position data is available in addition to the spike train data this animation can be used to identify potentially connected neurons.
3.6.4 Challenges of big data in neuroscience
As with many fields the information technology age has had considerable impact on the field of neuroscience. The 125 trillion synapses or connections between neurons in the brain, remains a problem that not even today’s computers can completely model. Indeed at the moment it is not possible to simultaneously record the activity of the 200 million+ neurons. Nevertheless it is possible to record a subset of this and ask meaningful questions based on these recordings. Extracting answers from the mass of data generated by recording even a small sub-set of neuron activity requires researchers to confront the ‘big data’ problem. To be complete it is necessary to define what is meant by the term ‘big data’ and exactly what the problems are in its analysis and presentation.
Big data is a term which has been very poorly defined, usually by salesmen determined to push their product as a solution to extracting useful information. Usually this is a data mining product that attempts to identify sales opportunities from purchasing data or internet browsing histories. Ward and Barker however provide a more academically satisfying definition of the term. After reviewing its history and use they decided to define big data as:
Chapter 3: Neuroscience
Page | 53
“Big data is a term describing the storage and analysis of large and or complex data sets using a series of techniques including, but not limited to: NoSQL, MapReduce and machine learning” (Ward & Barker, 2013).
From this definition some key points regarding big data may be extracted: 1. The term big data may be applied to data sets which are:
a. Physically large in terms of data points or
b. large in terms of complexity, i.e. high data dimensionality even if small in terms of physical size or
c. Both of the above where the data set is physically large and complex.
2. Effective analysis of the data is a non-trivial task requiring considerable computing power. This may include the latest in machine learning and artificial intelligence algorithms.
The recording and analysis of multi-dimensional spike train data for neuroscience clearly falls into 1(c) and 2 above. Such data usually exhibits certain attributes that present challenges to the data analyst. These are usually summarised as (Laney, 2001):
i. Volume – High volume data refers to the number of individual data points being collect. Neuroscience already provides far more potentially collectable data points than can be recorded. Advances in technology over time will serve to make a steadily growing number of data points recordable.
ii. Velocity – High velocity data refers to the recording rate at which data points are created. As with volume the rate at which MEA’s record the spike train signal, the sampling rate, is growing with time. The simple storage or high velocity data can present a considerable challenge.
iii. Variety – High variety data refers to the range of different sources that might generate spike trains. MEA data is only one method for observing neural network activity. Many other forms of recording exist such as Voltage Sensitive Dyes, Functional magnetic resonance imaging (fMRI) and Positron emission tomography (PET). Successful analysis will involve combining data from a variety of sources This definition pre-dates the development of the concept of big data. Some organisations argue to add variability and complexity to the above (SAS, 2014). Given Ward and Barker more precise definition of big data this would seem sensible.
iv. Variability – High variability refers to periodic peaks or bursts of activity which can place burdens both on recording and storing the data. Neural networks in particular generate bursts of high velocity data when subject to stimulation.
v. Complexity – Neuroscience data is unavoidable complex with many different data encoding schemes and the need to represent all the experiences of a living organism. In addition neural networks analyse and dictate responses to stimulation as well as giving rise to consciousness in living creatures. That the data will be complex is an unavoidable conclusion even if science does not yet fully understand how all these processes are achieved.
Bringing to bear the computing resources need to store and analyse neuroscience data will present a considerable challenge in and of itself.
Page | 54
3.7 Summary
Neuroscience is a complex science combining elements of classical biology, chemistry and electronics to explore the operation of neural networks. As with many fields the application of technology to the collection and analysis of neural networks has revealed a wealth of new data. However converting the raw data into usable information challenges even the modern computer with a truly ‘Big Data’ problem. Additionally this is a relatively young science (100-150 years old), Golgi and Cajal were its founders and they shared the 1906 Nobel Prize. Their remains much yet much to be learned about the operation of biological neural networks, ranging from data encoding schemes to the operation of neuron clusters’ as data processing centres. Progress will require software tools and new mathematical algorithms that address the problems of ‘Big Data’ and the analysis of point processes. It will also require a far wider sharing of data recordings, analysis code and expertise (Gibson et al., 2008). Finally, to be usable, the information extracted must be presented in a way that avoids cognitive overload to the user.
Chapter 4: Parallelism
Summary
In this chapter the term “parallel processing” is defined and its use as a means of delivering increased computing performance is examined.
Chapter 4
Parallelism
“parallel adj. Of or relating to the simultaneous performance of multiple operations: parallel processing”
Page | 56
4 Overview
The term parallelism has arisen in computing to describe the simultaneous performance of operation of multiple operations. In has emerged as the primary means of delivering increase computing power in the 21st century but its effective exploitation requires software developers to re-think software design. This has led to the proliferation of software written to historic design standards that fails to fully exploit the available power of the modern computer. It is argued that this waste of computing power must be avoided if the challenges of applying the computer to significant “Big Data” problems are to be met.