In this section I discuss the statistics of the responses obtained from the sparse coding network after training. As an example I use the responses from the face discrimination network discussed in Section 5.2.4 (which was trained on 10 images of each of 10 different individuals); these results are typical of those obtained from other runs. For comparison I will look at two cases that strip key features from the sparse coding network. First, I cut the feedback connections but leave the dynamics of the individual neurons intact, to the network dynamics become
˙
v =GTu+λS′
(v). (5.2)
This will illuminate the role feedback plays in recognition performance and sparsening of responses, and provide a prediction of how recognition would suffer in the event that feedback connections were cut in the real biological system. Second, I simply
treat the trainedGmatrix as a feed forward linear filter, that is, I set v =GTu. This
shows how similar each input u is to each learned basis function in the absence of
the feedback inhibition that produces winner-take-all like behavior in the network. This linear feedforward network will allow us to see how much of the sparseness of the responses is due to the form of the learned basis functions and how much is due to the sparsening nature of the dynamics.
Both the dynamic and linear feedforward networks still perform well according to our classification metrics, with an average ROC accuracy of 89% in both cases (compared to 91% for the feedback network). However, a more detailed look at the responses reveals that true recognition performance would likely suffer somewhat more that the optimal ROC result suggests. The purely linear feedforward model lacks the bimodal response distribution that cleanly separates “on” responses from “off” responses and makes readout particularly easy. The response distribution of
the dynamic feedforward network is still bimodal, but while the largest responses of an individual neuron tend to be to its preferred person, many significant responses are to other people due to the lack of inhibitory feedback from other neurons in the network. Hence our model predicts that, if feedback connections in the visual path- way were somehow cut, recognition performance would suffer but not be eliminated entirely—instead we would expect increased confusion between similar people or ob- jects. Feedback is crucial for learning, however—we would expect a person with such an injury to be unable to learn to recognize new people or categories.
Figure 5.13(a) is a histogram of the strength of all responses to all images in the testing data set (100 images times 15 neurons for 1500 total responses). The response distribution is bimodal, as specified by the sparse prior, with most responses near
zero. The “large” responses are centered around roughly 1.25, somewhat larger than
the second peak location of 1 in the prior as the inputs bias all responses to be larger than the unstimulated equilibrium points of 0 and 1. The kurtosis excess of this
distribution is 8.7, reflecting its sparse and bimodal nature. The responses of the
dynamic feedforward network, depicted in Figure 5.13(b), are still bimodal, and are in general larger due to the lack of inhibitory feedback. These responses are still
sparse, with a kurtosis excess of 6.6. The responses of the feedforward network are
unimodal and widely varied, but due to the nature of the sparse basis functions are
still sparse (but less so), with a kurtosis excess of 3.5. Figure 5.13(c) is the same
histogram for the feedforward network; the distribution is clearly unimodal.
One often-suggested role for sparseness is the reduction of redundancy by decor- relating neuronal responses (Vinje & Gallant, 2000). Figure 5.14(a) is a histogram of the correlation coefficient between all neuron pairs (15 choose 2, or 105 pairs). Most correlation coefficients are negative, reflecting the inhibitory effect neurons have on one another. Overall correlations are weak, with a mean absolute value of the corre-
0 0.5 1 1.5 2 2.5 3 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 v i fraction of responses (a) 0 2 4 6 8 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 v i fraction of responses (b) −10000 −500 0 500 1000 1500 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 v i fraction of responses (c)
Figure 5.13: Histogram of the strength of all responses to all images in the testing data set (1500 responses total). (a): feedback network depicted in Figure 5.3. (b): the same network with the feedback connections cut. (c): linear feedforward network
−1 −0.5 0 0.5 1 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 correlation coefficient
fraction of unit pairs
(a) −10 −0.5 0 0.5 1 0.05 0.1 0.15 0.2 0.25 0.3 0.35 correlation coefficient
fraction of unit pairs
(b) −10 −0.5 0 0.5 1 0.05 0.1 0.15 0.2 0.25 0.3 0.35 correlation coefficient
fraction of unit pairs
(c)
Figure 5.14: Histogram of the correlation coefficient between all neuron pairs (105 neuron pairs total). (a): feedback network depicted in Figure 5.3. (b): the same network with the feedback connections cut. (c): linear feedforward network with the
same G matrix.
the feedback connections cut; correlations in this setting are somewhat higher, with
an mean absolute value of 0.28. Finally, Figure 5.14(c) is the same histogram for the
linear feedforward network. Neuronal responses are more strongly correlated in this
case, with a mean absolute value of the correlation coefficient of 0.37. From this we
see that both the dynamics induced by the sparse prior and the recurrent feedback play a role in decorrelating neural responses. Note that we are not considering tem- poral correlations here (as our network considers each image separately rather than an image sequence), but correlation in the amplitude of neural responses.
work serves to both enhance the sparseness of the responses (through the sparse prior distribution encoded in each neuron’s dynamics) and to reduce the correlation be- tween the responses of different neurons. This decorrelation reduces the redundancy of information carried in the firing rates of different neurons.