Neural coding
Population coding
http://www.phys.ens.fr/~nadal/Cours/MVA
Jean-Pierre Nadal
CNRS & EHESSLaboratoire de Physique de l’ENS
(LPENS, UMR 8023 CNRS – ENS – SU – Université de Paris)
Ecole Normale Supérieure (ENS)
&
Centre d’Analyse et de Mathématique Sociales
(CAMS, UMR 8557 CNRS – EHESS)
Ecole des Hautes Etudes en Sciences Sociales (EHESS)
Menu
This part:
Population coding
Continuous stumuli (e.g. an orientation)
tuning curves
Tuning curves
A: Recording from neuron in primary visual cortex (V1, area 17, striate cortex) in monkey when presented with moving bars of light falling over the neuron’s receptive field.
B: Gaussian tuning curve fitted to the responses.
Hubel and Wiesel, 1968; Henry et al 1974; Wandell 1995
Jeffrey Taube
very large number of coding cells
each cell has its prefered stimulus
Prefered direction
a
Example:
Head direction cells
Population vector
Coding of different movement directions by a population of neurons in the motor cortex.
Weighted vectorial contributions of individual cells (light purple lines) sum to yield a population vector (orange) which is congruent with thedirection of movement(yellow).
Georgopoulos, Schwartz & Kettner, Science 1986
Neuronal population coding of movement direction
Georgopoulos, Schwartz & Kettner, Science 1986
Neural population coding for movement direction
Each cell preferred direction
Population vector:
each neuron vote in favor of its preferred direction weight of the vote of a cell
i
= its mean activitydirection estimated as :
Σ
=
Σ
/
Is the cell coding for its prefered stimulus?
(as suggested by the population vector analysis)
Head direction cells
POPULATION CODING
Each cell: spikes according to a
Poisson process
𝜃𝜃
→ ν
θ
→
𝑘𝑘
spikes in
0, 𝑡𝑡
𝑄𝑄
𝑡𝑡𝑘𝑘 𝜃𝜃 =
ν(θ)𝑡𝑡
𝑘𝑘𝑒𝑒
−ν
(θ)𝑡𝑡𝑘𝑘!
Population:
large number
p
of cells
Tuning curves:
Preferred stimulus for cell
iν
i(
θ
)
𝑖𝑖 = 1, … ,
𝑝𝑝
ν
i(θ) = 𝑅𝑅
𝑖𝑖𝜑𝜑
𝜃𝜃 −
𝜃𝜃
𝑖𝑖𝑎𝑎
𝑖𝑖POPULATION CODING
Parameter estimation approach [N Seung & H Sompolinsky, 1993]
Fisher information
Coding (information theoretic) approach [N Brunel & JPN, 1998]
Case of
unbiased
estimator:
unknown parameter
data
estimator
Case of
unbiased
estimator:
unknown parameter
data
estimator
Quadratic error:
Cramer-Rao
bound (1945):
where F(
θ
) is the
Fisher Information
:
Case of
unbiased
estimator:
unknown parameter
data
estimator
Quadratic error:
Cramer-Rao
bound (1945):
where F(
θ
) is the
Fisher Information
:
Optimal bound: equality for specific cases («
efficient estimator
»)
Similar to the uncertainty principle in Quantum Mechanics
Cramer-Rao with bias
Cramer-Rao with bias
Data/stimulus in higher dimension
Covariance matrix
Fisher information matrix
Cramer-Rao bound (unbiased case):
(with bias):
Σ
Σ
(𝜃𝜃) ≥ 𝐹𝐹(𝜃𝜃)
−1Σ
(𝜃𝜃) ≥
𝜕𝜕 ̂𝜃𝜃
𝜃𝜃𝜕𝜕𝜃𝜃 � 𝐹𝐹(𝜃𝜃)
−1�
𝜕𝜕 ̂𝜃𝜃
𝜃𝜃𝜕𝜕𝜃𝜃
𝑇𝑇Here the matrix inequalityA ≥ B means A-B positive semidefinite.
𝜕𝜕 �𝜃𝜃 𝜃𝜃
𝜕𝜕𝜃𝜃 is the Jacobian matrix of coordinates
𝜕𝜕 �𝜃𝜃𝑖𝑖 𝜃𝜃 𝜕𝜕𝜃𝜃𝑗𝑗
Sompolinsky, 1993, 2001
link with decision making and psychophysics: discriminability (or sensitivity)
Measure of the performance in a discrimination task
𝜃𝜃
1= 𝜃𝜃 𝑣𝑣𝑣𝑣. 𝜃𝜃
2= 𝜃𝜃 + 𝛿𝛿𝜃𝜃
See lecture on Decision making (or 𝜃𝜃1 = Noise, 𝜃𝜃2 = Signal + Noise)
Maximum Likelihood estimator
Fisher information width of ML error curve
𝛿𝛿𝜃𝜃
1
𝑭𝑭(𝜽𝜽) Fisher information
Back to our model
Population coding
with Poisson neurons
𝜃𝜃 → ν𝑖𝑖 θ , 𝑖𝑖 = 1, … , 𝑝𝑝 → 𝑘𝑘𝑖𝑖 𝑖𝑖 = 1, … 𝑝𝑝 (numbers of spikes in 0, 𝑡𝑡 )
𝑄𝑄
𝑡𝑡𝑘𝑘
𝑖𝑖 𝑖𝑖=1𝑝𝑝𝜃𝜃 = �
𝑖𝑖=1 𝑝𝑝ν
𝑖𝑖(θ)𝑡𝑡
𝑘𝑘𝑖𝑖𝑒𝑒
−ν
𝑖𝑖(θ)𝑡𝑡𝑘𝑘
𝑖𝑖!
tuning curve
(= mean response)
Fisher information
Seung & Sompolinsky 1993
Figure from Seung & Sompolinsky 1993
Seung & Sompolinsky 1993
tuning curve
Fisher information Hence, more (Fisher) information
from cells with high slopes at the
stimulus value
not from cells with their prefered stimuli close to the stimulus values.
Rem.: similar to what has been said from the maximization of mutual information, see Laughlin’ case.
Mutual information
Mutual information between stimulus an neural code
Asymptotic limit (large population and large time limit)
𝐼𝐼 𝜃𝜃, 𝑋𝑋 → 𝐼𝐼
𝐹𝐹𝜃𝜃 ≜ − � ln 𝜌𝜌(𝜃𝜃) 𝜌𝜌(𝜃𝜃)𝑑𝑑𝜃𝜃 −
1
2 � ln
𝑭𝑭(𝜽𝜽)
2𝜋𝜋𝑒𝑒
𝜌𝜌(𝜃𝜃)𝑑𝑑𝜃𝜃
Statistical inference context (iid observations) • B Clarke & A Barron, IEEE Info Theory 1990 • J Rissanen, IEEE Info Theory 1996
Neuroscience context
Mutual information between stimulus an neural code
Asymptotic limit (large population and large time limit)
𝐼𝐼 𝜃𝜃, 𝑋𝑋 ≤ − � ln 𝜌𝜌(𝜃𝜃) 𝜌𝜌(𝜃𝜃)𝑑𝑑𝜃𝜃 −
1
2 � ln
𝑭𝑭(𝜽𝜽)
2𝜋𝜋𝑒𝑒
𝜌𝜌(𝜃𝜃)𝑑𝑑𝜃𝜃
• X-X Wei & A A Stocker, Neural computation 2016 Statistical inference context (iid observations): • B Clarke & A Barron, IEEE Info Theory 1990 • J Rissanen, IEEE Info Theory 1996
Neuroscience context:
• N Brunel & JPN, Neural computation 1998
Limit approached from below
As seen previously,
the Fisher information is proportional to the time window length
and scales with the number of cells N (for independent cells given the stimulus). Hence, at leading order,
𝐼𝐼 𝜃𝜃, 𝑋𝑋 ~
1
2 ln
𝑡𝑡
𝑁𝑁
𝑡𝑡
Mutual information between stimulus an neural code
Asymptotic limit (large population and large time limit)
As seen previously,
the Fisher information is proportional to the time window length
and scales with the number of cells N (for independent cells given the stimulus). Hence, at leading order,
Universal behavior:
whatever the model/system, whenever the Fisher information exists, - in a large signal-to-noise limit (low noise or large system), the above expression applies (with generalisation to multidimensional input); - asymptotically the mutual information increases as ½ ln ‘data size’.
𝐼𝐼 𝜃𝜃, 𝑋𝑋 ~
1
2 ln
𝑡𝑡
𝑁𝑁
𝑡𝑡
Mutual information between stimulus an neural code
Asymptotic limit (large population and large time limit)
Back to the single cell case
in the
low noise limit
vanishing Gaussian noise
Mutual information:
+ noise
(see discussion on
Laughlin’s analysis)
Back to the single cell case
in the
low noise limit
vanishing Gaussian noise
Mutual information:
Fisher:
+ noise
(see discussion on
Laughlin’s analysis)
Back to the single cell case
in the
low noise limit
vanishing Gaussian noise
Mutual information:
Fisher:
+ noise
(see discussion on
Laughlin’s analysis)
Jeffreys prior⇒
Efficient coding, Fisher information and psychophysics
Issue:
measurement units for the stimulus?
Natural scale?
«
psychophysical function
»
based on the « just noticeable difference” (
JND
)
Example, the
Weber-Fechner law
(1860):
smallest noticeable increment in perception is constant if the
relative stimulus increment is constant
"In order that the intensity of a sensation may increase in
arithmetical
progression,
the
stimulus
must
increase
in
geometrical progression."
perceived stimulus intensity varies as ~
L. Kostal J Math Psycho. 2016; L. Kostal & P. Lansky, Scientific Reports, 2016
Efficient coding, Fisher information and psychophysics
Issue:
measurement units for the stimulus?
Natural scale?
∆
λ
=
just noticeable difference (
JND
) in the perception
Psychophysics suggests
∆
λ
=cst
, that is, it is independent of λ
L. Kostal J Math Psycho. 2016; L. Kostal & P. Lansky, Scientific Reports, 2016
Efficient coding, Fisher information and psychophysics
Issue:
measurement units for the stimulus?
Natural scale?
∆
λ
=
just noticeable difference (
JND
) in the perception
Psychophysics suggests
∆
λ
=cst
, that is, it is independent of λ
Perceptual sensation
= function of the stimulus intensity
Cramer-Rao: Hence
Efficient coding:
Psychophysical function
Mutual information
between stimulus an neural code
Asymptotic limit (large population and large time limit)
This asymptotic limit is
valid whenever the Fisher information is well defined
.
If the Fisher information does not exist
(being infinite due to singularities),
• the mutual information is still well defined,
• the information still scales with the logarithm of the data size,
• the prefactor is no more ½, but higher - and still a rational number. In the
simplest case, the prefactor is 1.
Ref: Haussler & Opper, « Mutual information, metric entropy and cumulative relative entropy risk », 1997 (https://projecteuclid.org/euclid.aos/1030741081)
at leading order
𝐼𝐼 𝜃𝜃, 𝑋𝑋 → 𝐼𝐼
𝐹𝐹𝜃𝜃 = − � ln 𝜌𝜌(𝜃𝜃) 𝜌𝜌(𝜃𝜃)𝑑𝑑𝜃𝜃 −
1
2 � ln
𝑭𝑭(𝜽𝜽)
2𝜋𝜋𝑒𝑒
𝜌𝜌(𝜃𝜃)𝑑𝑑𝜃𝜃
𝐼𝐼 𝜃𝜃, 𝑋𝑋 ~
1
2 ln
𝑡𝑡 𝑁𝑁
Back to the
population code
Example:
head direction cells,
triangular tuning curves
F (
θ
) =
=
Prefered direction
a
N. Brunel & JPN 1998 <error>
(Cramer Rao bound) Information
5000 cells
However, in the opposite limit,
short times
(or low signal to noise ratio):
different predictions from
However, in the opposite limit,
short times
(or low signal to noise ratio):
different predictions from
mutual information and Fisher information
width of the tuning curve (degrees)
SD(error)
Mutual information
Butts & Goldman, Plos Bio. 2006
response-specific information:
stimulus-specific information:
Information carried by cells in high and low signal to noise regimes
Coding, issues: Noise
Fluctuations in stimuli
visual stimulus = photons = random events
Intrinsic noise – computation with unreliable elements
- von Neumann, 1952!noisy receptors; unreliable synapses; ion channels intrinsic noise; quasi-random inputs from many neurons
Noise not as large as thought to be
natural stimuli more reliable spike timing
Mainen & Sejnowski, Science 1995; Baudot et al, Frontiers in neural circuits 2013
spike based computation: every spike carries information
Boerlin & Denève, PLoS Comp. Biol. 2010
Noise resulting from/allowing to efficient computation
stochastic resonance
Wiesenfled & Moss, Nature 1995 ; McDonnell & Ward, Nature Reviews Neurosc. 2011
balanced networks
Noise correlations in neural populations
Issues: Correlations
Fisher information:
For some types of correlations, FI can have a finite limit in the large size limit
(instead of being proportional to the number of coding cells)
L. Abbott & P Dayan, 1999; H.Yoon & H sompolinsky 1999, H. Sompolinsky et al, 2001
Some types of correlations may increase information
Issues: What is the code?
Efficient coding: given an hypothesis on what carries information, analysis of the code
efficiency. But this in itself does not validate the hypothesis.
Examples:
•
population coding: coding a stimulus or a probability distribution?
CH Anderson, Basic elements of biological computation systems, Int J. of Phys. C, 1994 R Zemel, P Dayan, A Pouget, Probabilistic interpretation of population codes, Neural Computation, 1998
•
spike based computation: every spike carries information
Boerlin & Denève, PLoS Comp. Biol. 2010
•
computation with attractors
(see lectures on Decision making and on Memory)•
information in the transient dynamics - computation from low-dimensional
dynamics
Females prefer males with the brightest yellow head
Females prefer males with the brightest yellow head
The Egyptian vulture
Females prefer males with the brightest yellow head
The Egyptian vulture
Hint:
this vulture gets its yellow colour from the consumption of excrements…
(it has been nicknamed in Spanish « moniguero » = « dung-eater »)
J. J. Negro et al, Nature 2002
Signal reliability
cost
(exposition to gastro-intestinal parasites) Evolutionary mechanism:
carotenoid pigments would diffuse passively to the skin,
the resulting yellow coloration could have become a useful signal in mating displays.
Signal reliability
cost
(exposition to gastro-intestinal parasites) Evolutionary mechanism:
carotenoid pigments would diffuse passively to the skin,
the resulting yellow coloration could have become a useful signal in mating displays.
Costly signaling / the handicap principle
(Zahavi)
Signal reliability
cost
(exposition to gastro-intestinal parasites) Evolutionary mechanism:
carotenoid pigments would diffuse passively to the skin,
the resulting yellow coloration could have become a useful signal in mating displays.
Costly signaling / the handicap principle
(Zahavi)
Modeling: evolutionnary
Game theory
Neuroscience:
environment
stimulus
neural code
decoding,
decision making
Statistical (Bayesian) inference:
prior
parameter
observations
estimation
Neuroscience:
environment
stimulus
neural code
decoding,
decision making
Statistical (Bayesian) inference:
prior
parameter
observations
estimation
distribution
Ethology: handicap principle (Zahavi, 70’s) / Game theory: costly signaling (90’s)
population
hidden
signal
selection
Neuroscience:
environment
stimulus
neural code
decoding,
decision making
Statistical (Bayesian) inference:
prior
parameter
observations
estimation
distribution
Ethology: handicap principle (Zahavi, 70’s) / Game theory: costly signaling (90’s)
population
hidden
signal
selection
distribution
quality
On statistical inference as a game against Nature, see: The introduction in D. Haussler and M. Opper, The Annals of Statistics 1997, Vol. 25, No. 6, 2451-2492 http://projecteuclid.org/euclid.aos/1030741081,
P. D. Grünwald and A. P. Dawid, The Annals of Statistics 2004, Vol. 32, No. 4, 1367–1433 https://projecteuclid.org/euclid.aos/1091626173