Algorithm performance – Lab PC vs High Performance Computing (HPC)

The i-Raster Visualisation

10 Overview of the i-Grid Visualisation

10.1 Overview of Pair-wise Cross Correlation

10.1.4 Algorithm performance – Lab PC vs High Performance Computing (HPC)

The cross-correlation algorithm has been tested on both a standard lab PC and on Plymouth University’s high performance computer cluster (HPC). The performance of the algorithm in each environment is detailed in the relevant sub-section below.

A series of test datasets with the neuron count ranging between 50 and 2000 neurons were generated using a Neural Network Simulator (Borisyuk, 2002). Table 10-1 details the dataset sizes and expected number of cross correlations for each test run.

Neuron Count Cross Correlation Tasks

50 1275 100 5050 150 11325 200 20100 300 45150 400 80200 500 125250 1000 500500 2000 2001000

Table 10-1: Test dataset sizes and number of cross correlations to be performed.

For each dataset the neural simulator generated a two minute simulated recording of spike train data from the network with millisecond resolution for spiking events. Each recording therefore covers a 120,000 millisecond time period.

10.1.4.1 Lab PC Cross-Correlation algorithm performance

The specification for the lab computer used to test the algorithm is as follows:

Item Description

Processor Intel Core i7-2600 CPU @ 3.40GHz

Installed Memory 16.0GB

Operating System Windows 7 Enterprise (64 bit) Service Pack 1

Table 10-2: Lab PC Specification

The primary measure of performance will be the number of cross correlations completed per a second. It is expected that smaller datasets will generate a lower rate of cross correlations per a second as the initialisation and in memory management of data will be a significant portion of the total time spent performing the cross correlation task. For larger dataset the cross-correlations per second rate is expected to rise before become fairly stable as the overhead becomes a smaller percentage of the total compute time. The exact point at which this stabilisation occurs will depend on the hardware used and number of compute cores available. Hence as dataset sizes grow a point of diminishing returns will be reached after which the computation rate becomes static and the total time to complete a run will depend primarily on the number of spike trains being cross correlated. The Intel i7 CPU core used in this test contains four cores which exploit Intel’s hyper threading technology to offer eight virtual compute cores.

The results generated by testing the cross-correlation algorithm in this environment are detailed in Table 10-3 below:

Chapter 10: The i-Grid Visualisation

Page | 151

Neuron Count Cross Correlation Tasks

Total Time (Sec) Per second rate

50 1275 21.84 58.37 100 5050 83.05 60.81 150 11325 172.93 65.49 200 20100 305.34 65.83 300 45150 680.24 66.37 400 80200 1201.63 66.74 500 125250 1872.54 66.89 1000 500500 7445.04 67.23 2000 2001000 29744.07 67.27

Table 10-3: Lab PC Cross-Correlation rates for various sized datasets

As expected the performance of the lab PC plateaued with datasets in excess of 150 spike trains. For dataset sizes of 150 spike trains or less the average number of cross correlation tasks completed per a second was 61.56. This contrasts with the 200 – 2000 spike train range where the completed cross correlation tasks per a second was 66.72. The average across the entire range was 65 cross-correlations per a second.

10.1.4.2 Plymouth University High Performance Computing (HPC) clusters performance

To demonstrate that the cross-correlation algorithm fully utilises the computational resources made available to it the tests performed on the lab PC were then repeated using the high performance computing cluster at Plymouth University. The test run was performed on the universities Fotcluster1(Mills, 2013). Fotcluster1 is a 376 core distributed-memory cluster which comprises of:

1. A ViglenHX425Hi HPC combined head and storage node, plus 46 compute nodes. 2. Each compute node is a single U Viglen HX224i, equipped with Dual Intel Xeon

E5420 (Harpertown) Quad Core 2.50Ghz processors and 8 GB of memory per motherboard.

3. Each node is connected via a 3com 3870 - 48 port network switch. 4. A total of 10TB of storage capacity is available.

For the purpose of this experiment a total of 32 compute cores were made available representing a ‘4x’ increase in compute cores compared to the lab PC system. For the algorithm to have scaled successfully it needs to show approximately a 3x - 4x increase in performance. Note a direct 4x improvement is not expected since the overhead from managing the compute cluster is noticeably larger than that of a multi-core CPU operating on a single motherboard. Table 10-4 sets out the results achieved with the university HPC:

Neuron Count Cross Correlation Tasks

Total Time (Sec) Per second rate

50 1275 10.51 121.37 100 5050 29.04 173.89 150 11325 55.78 203.04 200 20100 92.85 216.47 300 45150 196.48 229.80 400 80200 339.09 236.51 500 125250 525.23 238.47 1000 500500 2031.16 246.41

Page | 152

Neuron Count Cross Correlation Tasks

Total Time (Sec) Per second rate

2000 2001000 8039.12 248.91

Table 10-4: HPC Cross-Correlation rates for various sized datasets

As expected the increased availability of compute cores in the HPC results in a marked increase in performance. Recall that the same algorithm is executing in both situations without any re-coding so any increase in performance must arise from

utilisation of the HPC’s increased resource availability. Table 10-5 contrasts the

performance between the desktop and HPC environments:

Neuron Count Cross Correlation Tasks Desktop HPC HPCx Faster 50 1275 58.38 121.31 2.08 100 5050 60.81 173.90 2.86 150 11325 65.49 203.03 3.10 200 20100 65.83 216.48 3.29 300 45150 66.37 229.79 3.46 400 80200 66.74 236.52 3.54 500 125250 66.89 238.47 3.57 1000 500500 67.23 246.41 3.67 2000 2001000 67.27 248.91 3.70

Table 10-5: Contrast between desktop and HPC performance.

As anticipated the majority of HPC results placed firmly in the middle of the 3 – 4x faster range. The increased overhead from managing the cluster was significant only up to the 150 spike train dataset size. Additionally while the performance of the desktop system plateaued sharply at this point the HPC continued to improve its performance. Nevertheless it was clearly plateauing itself by the time the 2000 spike train dataset was reached. The more gradual onset of diminishing returns in the HPC environment becomes clearly visible if the cross correlation per second rate is charted (see Figure 10-5). Of course the primary benefit of employing a HPC is the ability to compute cross correlations for considerably more spike trains in less time. The increased compute rates of the HPC lead to a direct reduction in the time required to analyse a data set. In the experiments case the desktop required 8h 15min 44.07 seconds to fully process the 2000 spike train data set (2001000 cross correlations) whereas the same data set was processed by the HPC in 2h 13min 59.12seconds.

Chapter 10: The i-Grid Visualisation

Page | 153

Figure 10-5: Cross-correlations per second vs neural network size in desktop and HPC environments

With the cross-correlation operation completed it is now necessary to use these calculations to infer the structure of the neural network that generated the recorded data set.

10.2 Identifying interconnected neural clusters

At its most basic level a neural network is a set of interconnected neurons. These interconnections form clusters of co-operating / communicating neurons which act as data processors. The ultimate goal for i-Grid is to identify these clusters of interacting neurons from the recordings of their firing activity. Cross-correlation provides a measure of the “strength” of the connection between two recorded neurons. A strong cross-correlation signal indicates that the neurons are likely to be connected and form part of a cluster of neurons that work together to process data. Having compiled the cross-correlation data it is now necessary to identify these clusters.

Clustering of data sets can be done in a variety of different ways but the identification of clusters based on connectivity is known as connectivity based clustering / hierarchical clustering. The classic visualisation associated with a hierarchical clustering is the dendrogram. Previous work has shown that iGrid can effectively identify neural network structure(Stuart, Walter & Borisyuk, 2005) but one of its primary drawbacks was its inability to scale to larger data sets. In this case the grid presents so much detail that it becomes visually overwhelming to the user or simply too dense to be effectively displayed. Clearly then for a cross correlation grid to become effective Shneiderman’s Information Visualisation mantra should be applied. This requires the introduction of a means for the user to “overview” the data set and to filter the items included on the grid. The grid itself then becomes the element that delivers the “detail on demand”. In this implementation of the cross correlation

0 50 100 150 200 250 300 0 50 100 150 200 300 400 500 1000 2000 No of Cr o ss -corr ela tion s p er se con d

Neural Network Size

Cross-Correlations per a second.

Desktop vs HPC

Page | 154

grid the “overview” of the data set will be provided by an interactive dendrogram that groups and displays spike train clusters. The user’s interaction will allow the filtering of individual clusters or parts of clusters into / out of the “detailed” cross correlation grid.

The clustering algorithm selected as the foundation for the dendrogram overview is a hierarchical agglomerative clustering solution using complete linkage clustering. The metric employed to determine the distance between clusters is the Euclidean distance between the most significant peaks of the two spike trains normalised cross-correlogram. Alternative linking strategies were considered (single linkage and average linkage) best complete linkage proved the most effective at identifying the connectivity structure of the simulated neural network.

To understand how the selected clustering algorithm function works the terms agglomerative clustering and complete linkage need to be clearly defined:

 Agglomerative clustering is a recursive process where larger clusters are formed by the merging of smaller clusters. At each iteration the algorithm merges the two “closest” clusters to form a new cluster. The “closest” clusters are determined by applying a linkage algorithm to determine a “distance” between clusters

 A linkage algorithm is used to determine how closely two clusters are related. This involves both the selection of a metric to measure closeness and a means of determining that metric for clusters of multiple data sets.

o The selected metric for this implementation of iGrid is the strength of the peak normalised cross correlation bin between two spike trains.

o Complete linkage is used in which each spike train in each of the two clusters under consideration for merger provides a peak cross-correlation value. The largest cross-correlation peak determines how “close” the clusters are with larger peak values indicating greater closeness.

o Other linkage methods were considered but proved less effective in identifying connectivity. The considered methods were:

 Single linkage where a single spike train within each cluster would be taken as representing the cluster. The cross-correlation between these two would be used as a measure of closeness.  Average linkage where the average value of all cross-correlation

peaks between spike-trains in the two clusters is taken as a measure of closeness.

Logically the process of clustering can be continued until all spike trains have been grouped into a single cluster. This would however be inappropriate as connectivity cannot be inferred between neurons where the spike train’s peak cross-correlation bin does not exceed the Brillinger threshold (Brillinger, 1979). The result of clustering can therefore be represented as a series of dendrograms and un-clustered spike trains which did not show a significantly strong correlation signal to be clustered together.

The final output of the cross correlation and clustering algorithm is two JSON encoded data files:

Chapter 10: The i-Grid Visualisation

Page | 155

1. The detailed cross-correlation results encoding all cross-correlation bins for all recorded spike trains as per Figure 10-4. This data will be used to produce the classical iGrid representation.

2. A clustering file containing a series of dendrograms (and un-clustered spike trains) which will be used to provide both an overview of the dataset and a means to control filtering of spike trains into / out of the detailed iGrid view. These two files and the original spike train recordings themselves are passed to the iGrid visualisation module.

In document Visualisation Studio for the analysis of massive datasets (Page 150-155)