Comparison to other classification methods

The performances of two other classification methods, support vector machines (SVM) and Gaussian process classification (GPC), were compared to that of the BBCa. SVMs [88, 89] have enjoyed great popularity due to their successes in many applications, e.g. handwritten digit recognition [12] or 3D object recognition [8]. They operate by mapping the feature vectors w~ (see section 4.2) into a higher (or infinite) dimensional space and then finding a hyperplane in that space that sep- arates all mapped w~s belonging to one class from the rest. There is usually more than one such hyperplane, in which case the one with the maximum margin (i.e. the distance to the nearest points on either side) is selected. Classification of pre- viously unseen data is performed by determining on which side of the hyperplane the mapped w~ is found. For the comparison presented here, libsvm 2.83 _{was used.} This package also includes a python script which performs data scaling and model selection for a C-SVM using a radial basis function kernel (for details, see [38]).

Gaussian processes were originally designed for regression problems [54]. In brief, the idea is to define a Gaussian process prior over a function space such that the joint density of any number of points drawn from the functions in this space is a multivariate Gaussian. If the noise model (i.e. the probability density of the observed data given the values produced by the Gaussian process) is also Gaussian, then the predictions for new data points can be evaluated analytically in polynomial time. This is often the case in regression tasks. In classification problems, however, the predictions have to be probabilities of class lables, which necessitates the use of a ’squashing function’ that maps [_−∞,_∞] onto [0,1]. A popular choice for two-

class scenarios is the logistic sigmoid _1+exp(1 ₋_x₎, where x is the value predicted by the Gaussian process. Multi-class problems can e.g. be tackled by the softmax function [92]. Unfortunately, the required integrations can now no longer be carried out analytically, and thus approximations must be employed. For the comparsion presented here, the Monte-Carlo approach implemented in the package fbm-2004- 11-104_{[58] was used.}

Training and test data were generated by simulating a neuron’s response to eight stimuli. The ’strongest’ stimulus evoked 60 spikes/s, the ’second strongest’ 40 spikes/s, the ’weakest’ 15 spikes/s and the remaining stimuli evoked 30 spikes/s. Responses were recorded over a time window of 100 ms, during which the firing rate did not change. The resulting average firing rate was used as the input to the three algorithms. The classification target was the stimulus label. All stimuli were presented equally often. This rather simple scenario was chosen so as to allow for an a-priori determination of the expected classification performance limits.

The BBCa was trained with a maximum number of 50 intersections. Both a uniform prior over the number of intersections M and a prior _∝ 1

M2 (i.e. the prior

probability for a model is inversely proportional to the computational effort required to evaluate it) were tried, they produced very similar results. For the GPC, prior (hyper)parameters for the covariance function need to be chosen. I experimented with various settings and eventually chose a covariance function comprised of a linear part and a squared-exponential part. The linear part was given by a gaussian prior with standard deviation 10 and mean 0. The scale and relevance parameters of the exponential part were Gaussian with mean 0 and variances drawn from broad inverse-gamma distributions with mean 20 and 5, respectively. For details of the possible prior choices, the reader is referred to the extensive documentation of the

fbm-2004-11-10 package. Changing these prior parameters within 2 orders of mag- nitude did not affect the classification performance in any substantial way. As noted above, the SVM package contained python scripts to perform parameter selection via cross-validation in an automated manner.

The average percentage of correctly classified stimuli as a function of the trials per stimulus (i.e. the number of times a response to a given stimulus appeared in

1 10 100

# examples per class

12.5 15 17.5 20 22.5 24.5 % correct guesses SVM GPC BBCa

Figure 4.4: Comparison of the percentages of correctly classified stimuli as a function of the number of trials per stimulus. Black circles: BBCa, red squares: SVM, blue diamonds: GPC. Error bars are standard errors computed from 100 repetitions. The theoretical performance limits are 12.5 % (uninformed guess based only on prior knowledge of the stimulus distribution) and≈24.5% (best possible expected performance if the response generating distributions were known). Especially in the neurophysiologically relevant range of only a few available trials per stimulus, the BBCa outperforms SVM and GPC. For details, see text.

the training set) is shown in fig. 4.4. For a given number of trials per stimulus, each classifier was first trained on a training data set, then its performance was evaluated on a test data set containing 100 trials per stimulus. This procedure was repeated on 100 different training/test data sets to allow for an evaluation of means and standard errors. Since all 8 stimuli were equally likely, one expects a performance of 12.5 % based on this information alone. If the response-generating distributions were known, the optimal performance as predicted by Bayes’ rule would be≈24.5%. All three methods seem to converge towards this value, even though GPC is doing notably worse than the other two. For 1000 trials per stimulus, BBCa and SVM have virtually reached the theoretical optimum (not shown). However, especially in

the neurophysiologically relevant range of only a few available trials per stimulus, BBCa outperforms both SVM and GPC. This indicates that BBCa is a more suitable method for neural response classification than the other two. Moreover, as detailed below, it allows for an exact evaluation of the evidence eqn. (4.8), which is necessary if subsequent stages of Bayesian inference are to be conducted without introducing approximation errors. This is an additional advantage which SVM cannot offer.

In document Bayesian and information theoretic tools for neuroscience (Page 85-89)