MMSP 2014
DSP Algorithms and Adaptive systems
for AUDIO applications
A3lab people
Full professor
•
Francesco Piazza
Researcher
•
Simone Fiori
•
Stefano Squartini
•
Christian Morbidoni
Post Doc
Researcher
•
Stefania Cecchi
•
Paolo Peretti
•
Emanuele Principi
•
Michele Nucci
•
Marco Grassi
Ph.D. Student
•
Marco Virgulti
•
Leonardo Gabrielli
•
Francesco Faccenda
•
Marco Fagiani
•
Marco Severini
MMSP 2014
A3lab activities
Main Research activity
area
• DSP algorithms for Audio and
Multimedia applications
• Computational Intelligence algorithms
for Multimedia
• Real-Time processing oriented
Teaching activity
• Circuit Theory• Digital Signal Processing and
Computational Intelligence • Electrical Machines and Systems
Projects and collaboration
• Projects Funded by public agencies(European Community also) and private companies
• Several active collaborations with
International Academies and Enterprises
A3lab facilities
Laboratory Audio-DSP
•
equipped with professional
audio instrumentations (e.g.,
professional sound cards,
microphones, loudspeakers, etc)
Semi Anechoic Chamber
•
The chamber dimensions are 9
m x 7 m x 5 m
•
This chamber is qualified ISO
MMSP 2014
Some A3lab projects
hArtes
– FP6 [2006-2010] - 634K
DISCOVERY
- eContentPlus [2006-2010] - 300K
SEMLIB
- FP7 [2010-2012] - 300K
TASCA
– POR [2010-2011] - 144KSAIYL
– POR [2009-2010] - 111KHOMELINE
– POR [2009-2010] - 40KMoretti Forni
– “Giovane Tecnologo” [2009-2010] - 40KFBT
– “Giovane Tecnologo” [2009-2010] - 40KeDoor
– 598 [2007-2008] - 56KCMT
– 598 [2006-2007] - 20KLine Arrays
– [2007-2009] – 99KOthers
– [2008-2010] – 50KCOST A32 – [2006-2010]
COST 2102 – [2006-2010]
COST 277 – [2001-2005]
Funded
European
Projects
Funded Private Projects
EU COST Actions
Funded Regional Projects
A3lab collaborations
Academia/research centers (formal):
University of Illinois at Chigago (USA), South China University of Technology
(China), Fondazione Bruno Kessler (Italy), Università La Sapienza, Digital
Enterprise Research Institute (Ireland), Texas Instruments European University
Program (
equipment donation received from 2010 to date
).
Academia/research centers (informal)
(active Erasmus links)Riken Institute (Japan), University of Stirling (UK), University of Windsor
(Canada), University of Aachen (Germany), Fraunhofer Institute (Germany),
Escola Universitaria Politecnica de Matarò (Spain), Aalto University (Finland),
Technical University of Munich (Germany) and others
Companies/Enterprises:
Texas Instruments, Thales, Thomson, HP, Roland Europe, KORG, Faital, CMT,
FBT, Indesit, Radvision Italia, Proietti Planet, AYT, Microhard, Imolinfo, NET7, and
more
MMSP 2014
A3lab research fields
•
Optimize the listening experience according to the
characteristics of the acoustic environment and the user
needs
Audio Rendering
•
Systems where the use of speech is involved to enable a
certain service
Speech-interfaced Systems
•
Digital music processing
Audio Rendering
Multichannel Equalization
Wave
‐
field Synthesis and Analysis
Reverberation
3D audio
Acoustic Echo Cancellation
MMSP 2014
Audio Rendering
Multichannel Equalization
Fixed • To compensate small environments Adaptive • To consider variable environments Multipoint • To enlarge the sweet spot with several measures around the listener Frequency domain • To reduce the computational complexity for real time approachesWave
‐
field Synthesis and Analysis
Audio Rendering
Reproduction systems, based on stereo or multichannel technique, are designed to obtain an optimal acoustic sensation in only one point of the environment (sweet spot).
WAVE FIELD SYNTHESIS
(WFS)
implements sound fields reproduction
WAVE FIELD ANALYSIS
(WFA)
MMSP 2014
Digital Effects : Reverberation
It is probably the most used audio effect employed by musician during live
performances and recording session.
HYBRID REVERBERATOR: based on a combined approach that use
measured impulse responses for the early reflection
synthetic IR for the late reflections
3D Audio
Audio Rendering
Advanced Audio Spatializer
The system is composed of two
parts:
•
a sound rendering system based on
a crosstalk canceller
•
a listener position tracking system
MMSP 2014
Acoustic Echo Cancellation
Audio Rendering
Stereo acoustic echo cancellers (SAECs) have become essential after spreading
of multichannel systems, introduced for ensuring higher realistic performance in
terms of speaker localization.
DECORRELATION
is used to weaken the linear relationship between the
two input channels must be introduced in order to obtain good echo
cancellation
.Active Noise Cancellation
Audio Rendering
It is based on sound field modification by destructive wave interference, i.e.,
principle of superposition
•
A real time feedback system
applied to a real noise recorded in
a yatch
•
Quiet zone close to the pillows :
microphones and loudspeakers
positioned near the bed
MMSP 2014
Speech-interfaced Systems
Distributed Speech-based Systems for
Smart Homes
Pre-processing Framework for
Speech-interfaced Systems
Distributed Speech-based
System for Smart Homes
Main Issues
– Distributed system for recognition of building automation vocal commands and of
distress calls for emergency state detection.
– Two functional Units: CMPU (Central Management and Processing Unit) and
LMCU (Local Multimedia Control Unit)
ITAAL corpus
– 20 people involved(10 men and 10 women)
– Headset & Distant Microphones
MMSP 2014
Distributed Speech-based
System for Smart Homes
LMCU
w/ Vocal Effort Classifier
Advancements
– Small-vocabulary speech recognizers
(based on the i-vectors paradigm)
– Vocal Effort Classification (see
figures)
– Seamless integration of Sound
Identification and Novelty Detection module GMM Training Neutral corpus (APASCI) Neutral templates Shout templates Supervectors Extraction SVM Training UBM SVM Classification Model Speech Vocal effort
Speech-interfaced Tabletop
Fostering groups conversations by
visualizing suitable stimuli on the tabletop display
Stimuli can be floating words and/or
pictures. Stimuli are related to the topic of the discussion
Topics are obtained by capturing
spoken keywords
Perception: captures the ongoing situation
around the table (status of the system, conversation keywords).
Interpretation: draws the topic of the
conversation based on recognized keywords and predefined topics.
MMSP 2014
Pre-processing framework for
Speech-interfaced Systems
Pre-processing framework
composed by three cooperating module in cascade
Speaker Diarization: it pilots
the other two stages informing them who is speaking.
Blind Channel Identification:
the source-microphone Irs are blindly identified when one single speaker is active.
Speech Dereverberation:
reverberation is compensated directly on the SIMO systems obtained from the original MIMO one and original sources are yielded as output.
Noise robust implementation.
Feature Extraction GMM Training Training Recognition Models ) (k xn Speaker Diarization Speech Dereverberation BCI ) ( 1 k x ) (k xn ) (k xN 1 P PM ) ( ˆ k sM ) ( ˆ1 k s h
Overall Framework
Speaker Diarization
Speech-Reinforcement
Speech reinforcement (SR)
techniques aim to increase the speech intelligibility in adverse environment where the
communication is difficult.
SR system: composed by one
microphone, an amplifier and a loudspeaker at least.
Acoustic Feedback occurrence
due to the acoustic coupling between the microphone and the loudspeaker.
Suitable algorithms are needed:
PEM-AFROW based solutions adopted in this case.
Implementation on embedded
systems and application in real environments.
MMSP 2014
Speech-Reinforcement
Application to the automotive
dual-channel communication scenario
Two Acoustic Feedback and Echo
Cancellation problems to solve
Digital Music
Music Information Retrieval
Digital Music Effects
Music Synthesis
MMSP 2014
Music Information Retrieval
Acoustic Onset Detection
– Data-driven algorithm developed in
collaboration with Technical University of Munich (Germany)
– Hybrid Feature Extraction module based on
linear prediction in the wavelet domain and MFCCs
– Detector based on Bidirectional Long Short
Term Memory Recurrent Neural Networks
– Improvements with the recent SotA
Feature Extraction (WPEC, ASF) Neural Nets (RNN, BRNN, LSTM, BLSTM) Threshold Peak-Picking x[n] FN,M ODF Onsets Framing / Windowing DWPT coif5, dec_level=8 Nbands=25 Band Energy Compute Logarithm WPEC 25 25 Delta win=2 25 WPEC 0 00
}
WPEC x[n] Forget Gate Output Gate Input Input Gate • • • 1.0 Output Memory CellDigital Music Effects
Virtual Acoustic Feedback
– In collaboration with Aalto University (Finland)
– Nonlinear Digital Oscillator with a second-order peaking filter in the feedback path
– Pitch tracking algorithm (SNAC) included to adaptively select the input tone
– Wave Digital Triode nonlinearity included to improve realism
– Advancements: rise-time, compressor, smoothing, gain pedal
MMSP 2014
Digital Music Effects
Ibrida
– PureData tool for sound
hybridization
– Wavelet domain based
– Dynamic morphing driven by
automatic onset detection
– OSC controllable
Speech-driven wah-wah
effect
– Tuning the wah-wah effect by
means of voice commands
– Low-complexity speech feature
extraction
– Implementation on commercial
Music Synthesis
Physical Model of the Clavinet
– In collaboration with the Aalto University
(Finland)
– Recording and analysis of the different issues
(tones spectral characteristics, attack and decay, inharmonicity, spectrum ripple, beating, amplifier and tone switches)
– Digital Wave Guide based computational
MMSP 2014
Wireless Music
Wireless MUsic Studio (We-MUST)
– HW/SW platform for wireless music
– Based on the PureData and Jacktrip
open-source SWs
– Latency down to 4ms single-link
– Developments are currently on-going
(automatic device discovery and adaptive resampling)
Application example
– BeagleBoards (BB) are used to process
and send/receive the audio streams
Beagle
Board