3.5 Endnote
4.3.8 Quality of Algorithm Performance
In order to assess the quality of performance of the three algorithms, a Jaccard Index was used to compare algorithm results to a visual analysis. Briefly, the Jaccard Index
measures the similarity between two sequences of events (see Materials and Methods). When an algorithm detected events that closely matched those identified by visual inspection its Jaccard Index was close to 1 (100%) and it was considered to have performed well. When the events detected by an algorithm did not match those identified by visual inspection the
Jaccard Index was close to 0 (0%) and it was considered to have performed poorly. The performance of the three algorithms was measured across a variety of neural spiking activity including rhythmic bursting, tonic spiking, and mixtures of bursting and tonic spiking. Because it would be difficult and time consuming to recalibrate algorithm parameters for every spike train, the parameters were optimized and the same values were used across data sets (see Materials and Methods). Briefly, parameter optimization was accomplished using a two-step search process. The first step was a coarse sampling of algorithm performance for a wide range of values based on the originally published algorithm parameters. The second step was to sample a smaller range of parameter values at a higher resolution based on the best algorithm performance from the first step. The final parameter value was selected by looking at the highest Jaccard Index across the different spiking activity.
In general, bursts detected by the PS method were up to 50% similar to burst events
identified by visual inspection (Fig. 4.6, blue bars). The algorithm performed best on in vitro
data that were either bursting or spontaneously active and performed poorly (0 – 15%
101 transitioned between activity types. For simulation data that were either bursting or a
combination of bursting and tonic or spontaneous activity PS detected bursts that were 25 – 40% similar to visual inspection.
Performance of the CMA method varied across test data (Fig. 4.6, green bars). The best performance was 95% similarity for a rhythmically bursting spike train. Moderate performance
(30 – 60% similarity) was achieved for ambiguous activity for both in vitro and simulated data
as well as for a spike train with spontaneous bursting or both bursting and tonic spiking activity. CMA performed poorly (0 – 30% similarity) on spike train data that included large amounts of tonic spiking activity.
The Jaccard Index for EHV classification showed that the algorithm detected burst events that were up to 90% similar to visually identified events (Fig. 4.6, pink bars) and tonic spiking events that were up to 100% similar. EHV detected bursts that were more than 80% similar to visually identified events in data that were rhythmically bursting, mostly tonic spiking, transitioning, and phasotonically bursting. Moderate burst detection (40 – 80% similarity) was
achieved by EHV for data that were ambiguous (not clearly bursting nor tonic spiking), in vitro
bursting, and a mixture of bursting and tonic spiking. For in vitro data that were not clearly
bursting or tonically spiking EHV detected bursts that were about 20% similar to visual inspection.
Because the algorithm distinguishes between bursting and tonic spiking events, a Jaccard Index was evaluated for EHV separately for tonic spiking (Fig. 4.6, magenta bars). For simulation data that were visually classified as exclusively tonic, transitioning, or mostly tonic
as well as exclusively tonic in vitro data, EHV identified events that were over 80% similar.
Phasotonic and mixed activity were between 70 to 80% similar while the algorithm detected
tonic activity that was 0 – 20% similar in ambiguous simulation data and spontaneous in vitro
Figure 4.6. Quantitative comparison of algorithm performance
The Jaccard Index compares two sequences of binary events. A Jaccard Index close to 0 indicates poor performance whereas a value close to 1 indicates good performance. A Jaccard Index was calculated for all three algorithms to measure their performance in comparison to visual analysis for each spike train. The Extended Hill-Valley method generally outperformed the other two methods for both burst (pink) and tonic spiking (magenta) detection. The
Cumulative Moving Average method (green) performed best on rhythmically bursting data and poorly on spike trains with tonic activity. The Poisson Surprise method (blue) performed
moderately well for ambiguous in vitro data.
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Ambiguous Activity - in vitro Tonic Spiking - in vitro Spontaneous Bursting - in vitro Ambiguous Activity Tonic Spiking Mostly Tonic Spiking Phasotonic Bursting Bursting & Tonic Spiking Transitioning Regimes Rhythmic Bursting
Jaccard Index
Poisson Surprise Cumulative Moving Average
103
4.4 Discussion
Defining bursts based on the distribution of inter-spike intervals (ISI) has proven to be fast and powerful and has resulted in a fractured approach to classifying neural activity. Bursts that are defined by adaptive ISI threshold methods (e.g., Cumulative Moving Average, CMA) are not necessarily comparable to those that are detected by probability-based methods (e.g., Poisson Surprise, PS). In the results presented here, the bursts detected by different
algorithms did not always correspond to each other or to events identified by eye. In addition, because different types of neural activity have different ISI distributions, it is not clear how to find a parameter range for either an adaptive ISI threshold or probability-based method that translates well across data sets and is resilient to changes in baseline spike frequency. Finally, these ISI-based burst detection algorithms, by definition, do not yield results that discriminate between bursting and tonic activity.
The Extended Hill-Valley (EHV) analysis method uses a smoothed, history-dependent analysis signal to classify neural activity and can detect bursts and bouts of tonic spiking across data sets without changing the parameters. EHV, however, has a larger number of parameters that require calibration to determine how features of bursts and tonic spiking are defined. Once calibrated, classification of neural activity by EHV outperformed both PS and CMA on a range of spike train patterns. For spike trains that were selected because of their ambiguous activity, however, all three classification algorithms yielded results that were different (similarity of ~50% or less) than events classified by visual inspection. The classification results for ambiguous activity underscored the difficulties and nuances of determining how to objectively define a burst or tonic spiking.