The i-Raster Visualisation
12 Conclusions
This chapter reviews the research and the contributions it has made, there is a
discussion of future work, and finally, the achievements in the context of the original
research question.
12.1 Contributions
This section reviews the various contributions made by the thesis’s software
project. The software draws from the three primary fields of neuroscience, software
engineering and data visualisation uniting aspects of all three to produce an
extensible software tool for the analysis neural spike train recordings.
12.1.1 The Visualisation Studio (i-Pipeline)
The visualisation studio exploits the “pipelining” introduced by dataflow / visual
programming languages to create pipelines of data processing activities. These
pipelines process raw data into a form suitable for visualisation. The modern
computer increasingly delivers its computational power through multi-core
processors. To fully utilise this power a modern software application must be written
in a parallel form with units of work being performed by multiple compute cores. The
pipelines produced by dataflow programing are amenable to paralysation permitting
efficient execution of data processing activities in a multi-core environment.
Visual programming of a data processing pipeline also permits a researcher to
rapidly introduce, remove or re-order data processing activities to make the
processed data set more amenable to analysis. Visual analysis of the final data set is
the preferred approach and the visualisation studio provides tools to facilitate big
data analytics.
While this research has demonstrated the effectiveness of the visualisation
studio using neuroscience data the framework itself is generic and can be applied to
almost any field of research.
This research has also served to demonstrate that the researcher’s typical
desktop / laptop computer has the potential to be utilised far more effectively. A great
deal of its computational power is wasted when traditionally developed software is
used for data processing and analysis. This is most clearly demonstrated by the
visualisations produced which now handle thousands and not hundreds of spike
trains while remaining highly interactive.
12.1.2 The Neural Science Problem Domain Library (i-Pipeline)
The creation of the Visualisation studio’s problem domain layer demonstrates
how developers and domain experts working together can create libraries of
algorithms and visualisations. The libraries creator has been left almost completely
free to create a data representation for their problem domain. Any data analysis
algorithm in the problem domain can be “wrapped” into a Visualisation studio
Page | 184
process. The process wrapper ensures the simple deployment and incorporation of
the algorithm into a Visualisation studio dataflow pipeline.
Recognition of the modern trend of delivering increased computational power
through a multi-core architecture allows algorithms and visualisations to fully exploit
available compute resources. The implemented problem domain layer demonstrates
this by writing its most computationally expensive algorithm (cross-correlation) in a
way that adapts to available resources. Resources may range from a limited two
core laptop, through a 4-6 core desktop system to a full HPC compute cluster.
Visualisations that apply the techniques of visual analytics have been re-
engineered to also exploit the delivery of increased compute power through multiple
cores. This has allowed visualisation previously limited to presenting hundreds of
spike trains to present thousands while remaining highly interactive. To permit visual
exploration of these large data sets Ben Shneiderman’s Visual Information-Seeking
Mantra of “Overview first, zoom and filter, then details-on-demand” has been applied.
The final result is a significant improvement in the amount of data that can be
processed and effectively visualised on the typical researcher’s computer.
12.1.3 The i-Raster Visualisation
Somerville’s i-Raster visualisation (Somerville et al., 2011) has been re-
engineered. Its many spike train sorting algorithms are now available not only within
the visualisation but as data pre-processing operations in the pipeline. Paralysation
of the visualisations code has been the key to expanding its ability to present
thousands rather than hundreds of spike trains. Grouping of spike train data has
been used to provide an overview of the larger data set and a burst sort and
grouping algorithm introduced. Interactive zoom and filtering tools permit the visual
examination of detail in the spike train data set. Time filtering permits the detailed
examination of a large data set and the generation of multiple (smaller) data sets.
12.1.4 The i-Grid Visualisation
Stuart’s i-Grid visualisation (Stuart, Walter & Borisyuk, 2005) was originally
used to assist researchers in visually identifying clusters of inter-connected neurons
from their spike train recordings. The clustering technique of the original i-Grid has
been expanded to become the foundation of a new “overview” – the cluster
dendrogram. This overview serves to provide a means to zoom and filter i-Grid’s
display, while maintaining a complete overview of the data set.
The computationally expensive component of creating an i-Grid visualisation
is the cross-correlation process. This algorithm has been re-coded to fully exploit the
delivery of compute power through multi-core systems. In addition it has been re-
written to use Message Passing Java (MPJ) allowing it to effectively utilise high
performance compute (HPC) clusters. All of this is wrapped into a Visualisation
studio pipeline process that allows even this complex algorithm to be rapidly used in
any data processing pipeline.
Chapter 12: Evaluation and the Way Forward
Page | 185
12.1.5 The i-Animate Visualisation
I-Animate is a new visualisation that creates a representation of the modern
(large) multi-electrode array used to simultaneously record spike trains. An animation
of the recorded neuron spiking events over time is providing allowing the researcher
to visually identify potentially connected neurons. Overall and time filtered views of
the electrodes recorded activity levels are available through a selection of heat maps.
Available heat maps range from the classic (but visually ineffective) rainbow heat
map to Moreland’s “Diverging Colour Map for Scientific Visualization”.
12.2 Future Development
The Visualisation Studio and the implemented neuroscience problem domain
library represent a significant change in the way data is processed and visualised.
However modern technology still provides avenues through which greater gains can
be realised. This section examines some of these avenues to advance the data
processing elements and performance of the Visualisation Studio.
12.2.1 Exploitation of high performance GPU hardware
As described in chapter 4 techniques to code applications in a manner that
makes full use of the modern computers multiple compute cores has lagged far
behind the hardware’s ability to deliver increased performance. This research has
insisted that the application generated should make full use of multiple compute
cores as a means of increased performance delivery. This has been taken to its
furthest extreme with the computationally intensive pairwise cross-correlation
calculations on which the i-Grid visualisation is based. The algorithm as implemented
can be run on any computer supporting an MPJ installation from the humble laptop
to the HPC cluster without modification. The algorithm will determine the available
compute cores; assign cross-correlation operations and process results completely
independently from the user. Most users will not have the benefit of access to a HPC
cluster at will. The modern graphics processing unit (GPU) has already
revolutionised the field of interactive data visualisation. Now the computing power of
the graphics card is being opened up to accelerate scientific, analytics, engineering,
consumer, and enterprise applications. This provides even a simple laptop with
access to what is effectively a mini-HPC cluster. Programmatic access to GPU’s is
now available through application programming interfaces (API’s) such as:
NVIDIA’s Compute Unified Device Architecture (CUDA),
Khronos groups Open Computing Language (Open CL) and the
OpenMP Architecture Review Board’s Open Multi-Processing
(OpenMP) API
The Visualisation Studio is a cross platform Java based application entirely
capable of using these tools either through Java bindings or by invoking native code
from Java using the Java Native Interface (JNI). This opens the possibility of using
these mini-HPC clusters, present in almost all modern computers, to perform
complex and computationally intensive data analysis in addition to their more
Page | 186
traditional roles in producing interactive displays. The Visualisation Studio could also
optionally implement core components to run in such an environment. The parallel
execution engine would be a prime candidate for such a conversion.
12.2.2 Distributed Computing and Cloud Computing
This research has focused on the production of an application deployed to the
researcher’s desktop. However modern technology affords other options such as
distributed or cloud computing as a means to bring increased compute power to bear
on a problem. The Apache Hadoop framework is an open source Java application to
facilitate the distributed processing of very large data sets. Interfacing the
Visualisation Studio with the Hadoop framework would offer access to HPC scale
compute power even in the absence of a HPC or GPU option.
12.2.3 Application to other problem domains
Finally while the field of neuroscience is used in this research as the primary
problem domain the developed Visualisation Studio application has been designed
from the ground up as a general solution that can be applied to many different
problems. Science and nature are replete with problems that are “embarrassingly
parallel”. Such problems are well suited to the hardware and software tools (such as
the Visualisation Studio) that are now emerging. Alternate problem domains and
computationally intensive tasks which could be the subject of an implementation of
the Visualisation Studio’s problem domain layer might include:
Financial analysis and reporting.
Event simulation in particle physics.
Ensemble calculations of numerical weather prediction.
Genetic algorithms and many other evolutionary computing techniques.
Brute-force searches in cryptography.
Computer simulations comparing many independent scenarios, such as
climate models.
Fluid Mechanics
12.3 Conclusion
This research began by asking the question “How can Software Engineering
and Visual Analytics be applied to aid the general analysis of scientific data and
specifically current neural spike train data?” The development and testing of the
Visualisation Studio application has shown that:
Software tools that rely on delivering compute power from a single, ever
faster; CPU can only provide limited interactive data visualisations.
Software tools that embrace the delivery of compute power through multi-
core hardware and a parallel programming model can offer greatly
enhanced interactive data visualisations.
Chapter 12: Evaluation and the Way Forward
Page | 187
Practical parallel programming models can emerge from dataflow
programming’s pipeline model and the creation of a visual programming
language (VPL) for the problem domain under study.
These VPL programs can tackle complex data analysis task while being
efficiently and efficiently executed on modern multi-core computer systems.
Interactive data visualisations can manage the resulting “big data sets”
even on limited desktop hardware. This is achieved by combining Software
Engineering with the techniques of Visual Analytics. In the case of this
research the amount of data visualised increased by a factor of 10 from
hundreds to thousands of spike trains.
In future the most effective research will combine the efforts of a multi-
disciplinary team to produce valuable results. At the core of these teams
will be a problem domain expert and a software engineer.
Summary
This appendix serves as a guide for problem domain developers looking to build their own implementation of the iPipeline problem domain layer. It reviews the creation of a new data processing algorithm ready for inclusion into a workflows directed graph.
In document
Visualisation Studio for the analysis of massive datasets
(Page 183-188)