Active Audition and Sensorimotor Integration
for Sound Source Localization
Mathieu Bernard
Introduction
CIFRE thesis. Co-direction :
Patrick Pirim, Brain Vision Systems Bruno Gas, ISIR, UPMC
Alain de Cheveign´e, LPP, Paris Descartes
Outline
Artificial auditory system for sound localization, Audio-tactile model for texture recognition, Active audition and sensorimotor integration.
Autonomous robotics
Artificial auditory system
Bioinspired sound localization
Binaural localiation cues ITD - ILD Outer ear Outer ear Inner ear Inner ear Auditory model Implementation core library real time - c++ robot robotic simulation standalone programs sound capture simulated sound sources wav files
Outer ear model
Pinna Auditory fovea Microphone
Oriented toward the fovea Support
Servomotor
Outer ear = Pinna, microphone and software capture. Spectral and directional cues.
Inner ear model (1/2)
Inner ear = cochlear model and pulse train generation.
Cochlear model
Gammatone filterbank
Usually 30 channels from 300Hz to 8kHz at 20kHz
Inner ear model (2/2)
Inner ear = cochlear model and pulse train generation.
Pulse train generation
Cochlear channel output ⇒Pulse train
Discrete representation, noise suppression.
ILD computation (1/2)
Energy
for each pulse p(t), the energy is :
E(t) =
u=t X
u=t−T
p(u)2 (1)
ILD computation (2/2)
ILD = Interaural Level Difference
For each channeli :ILDi(t) =
2Eleft,i(t)
ITD computation (1/3)
ITD = Onset extraction and delay lines.
Onset extraction from a pulse train
Comparison with a dynamic threashold, 2 parameters.
ITD computation (2/3)
ITD = Onset extraction and delay lines.
ITD computation (3/3)
ITD = Onset extraction and delay lines.
Auditory and tactile model for texture perception (1/3)
Similar model for transduction and processing
Audition and touch support fine texture discrimination skills. Rat vibrissae and cochlea transduction based on resonance. Strong interaction for auditory/tactile spectral processing.
Auditory and tactile model for texture perception (2/3)
Whiskers filterbank adapted from gammatone cochlear model :
Feature extraction = Instantaneous Mean Power :
Auditory and tactile model for texture perception (3/3)
Texture classification : 8 textures, 3 layer perceptron
Response to a pure tone (whisker at 630 Hz)
Auditory Evoked Behaviors (1/2)
ILD Outer ear Inner ear Energy
Outer ear Inner ear
Auditory model
Motor control
Neck and body Energy
Motor control
Neck ⇒ ILD minimization
ILD <0⇒ turn right, ILD >0⇒ turn left. Video !
Wheels⇒ Follow neck orientation
Constant speed, Smooth rotations. Video !
Auditory Evoked Behaviors (2/2)
Sensorimotor Approach (1/3)
Environment state e ∈ E and motor state m∈ M,
Sensory states ∈ S, we haves = Φ(m,e), Φ is called a sensorimotor law.
Sensorimotor Approach (2/3)
Quand on dit [...] que nous localisons tel objet en tel point de
l’espace [...] cela signifie simplement que nous nous repr´esentons
les mouvements qu’il faut faire pour atteindre cet objet.
[...] Nous nous repr´esentons les sensations musculaires qui
accompagnent [ces mouvements] et qui n’ont aucun caract`ere
g´eom´etrique, qui par cons´equent n’impliquent nullement la
pr´eexistance de la notion d’espace.
H. Poincar´e, L’espace et la g´eom´etrie, 1845.
Proposed formalization for localization
Find the motor state ˜msuch as
˜
Sensorimotor Approach (3/3)
Sound source localization
Two ways for ˜mestimation
Orienting behavior
After completion we have ˜m=mend
send = Φ(mend,e)
= Φ(m0+δm,e0+δe),
Manifold learning
Sensory spaceS lies on a low-dim manifoldR,
Dimension reduction technique : S → R,
Same topology as the embodying space (at least locally).
Manifold learning (1/3)
Manifold learning (2/3)
CAMIL database, ITD vectors :
d=690→ d=3
Manifold learning (3/3)
2 outer ears models : HRTF and directive filters
Sensorimotor integration (1/2)
Learning algorithm
Iterative learning of R, Self-supervised.
Sensorimotor integration (2/2)
Intrinsic Dimension Estimation
Simulation (ILD, azimtuh 180 & 360)
CAMIL dataset (ILD & ITD, azimtuh 360, elevation [-30, 30])