DSP Algorithms and Adaptive systems for AUDIO applications

(1)

MMSP 2014

DSP Algorithms and Adaptive systems

for AUDIO applications

(2)

A3lab people

Full professor

•

Francesco Piazza

Researcher

•

Simone Fiori

•

Stefano Squartini

•

Christian Morbidoni

Post Doc

Researcher

•

Stefania Cecchi

•

Paolo Peretti

•

Emanuele Principi

•

Michele Nucci

•

Marco Grassi

Ph.D. Student

•

Marco Virgulti

•

Leonardo Gabrielli

•

Francesco Faccenda

•

Marco Fagiani

•

Marco Severini

(3)

MMSP 2014

A3lab activities

Main Research activity

area

•  DSP algorithms for Audio and

Multimedia applications

•  Computational Intelligence algorithms

for Multimedia

•  Real-Time processing oriented

Teaching activity

•  Circuit Theory

•  Digital Signal Processing and

Computational Intelligence •  Electrical Machines and Systems

Projects and collaboration

•  Projects Funded by public agencies

(European Community also) and private companies

•  Several active collaborations with

International Academies and Enterprises

(4)

A3lab facilities

Laboratory Audio-DSP

•

equipped with professional

audio instrumentations (e.g.,

professional sound cards,

microphones, loudspeakers, etc)

Semi Anechoic Chamber

•

The chamber dimensions are 9

m x 7 m x 5 m

•

This chamber is qualified ISO

(5)

MMSP 2014

Some A3lab projects

hArtes

– FP6 [2006-2010] - 634K

DISCOVERY

- eContentPlus [2006-2010] - 300K

SEMLIB

- FP7 [2010-2012] - 300K

TASCA

– POR [2010-2011] - 144K

SAIYL

– POR [2009-2010] - 111K

HOMELINE

– POR [2009-2010] - 40K

Moretti Forni

– “Giovane Tecnologo” [2009-2010] - 40K

FBT

– “Giovane Tecnologo” [2009-2010] - 40K

eDoor

– 598 [2007-2008] - 56K

CMT

– 598 [2006-2007] - 20K

Line Arrays

– [2007-2009] – 99K

Others

– [2008-2010] – 50K

COST A32 – [2006-2010]

COST 2102 – [2006-2010]

COST 277 – [2001-2005]

Funded

European

Projects

Funded Private Projects

EU COST Actions

Funded Regional Projects

(6)

A3lab collaborations

Academia/research centers (formal):

University of Illinois at Chigago (USA), South China University of Technology

(China), Fondazione Bruno Kessler (Italy), Università La Sapienza, Digital

Enterprise Research Institute (Ireland), Texas Instruments European University

Program (

equipment donation received from 2010 to date

).

Academia/research centers (informal)

(active Erasmus links)

Riken Institute (Japan), University of Stirling (UK), University of Windsor

(Canada), University of Aachen (Germany), Fraunhofer Institute (Germany),

Escola Universitaria Politecnica de Matarò (Spain), Aalto University (Finland),

Technical University of Munich (Germany) and others

Companies/Enterprises:

Texas Instruments, Thales, Thomson, HP, Roland Europe, KORG, Faital, CMT,

FBT, Indesit, Radvision Italia, Proietti Planet, AYT, Microhard, Imolinfo, NET7, and

more

(7)

MMSP 2014

A3lab research fields

• 

Optimize the listening experience according to the

characteristics of the acoustic environment and the user

needs

Audio Rendering

• 

Systems where the use of speech is involved to enable a

certain service

Speech-interfaced Systems

• 

Digital music processing

(8)

Audio Rendering



Multichannel Equalization



Wave

‐

field Synthesis and Analysis



Reverberation



3D audio



Acoustic Echo Cancellation

(9)

MMSP 2014

Audio Rendering

Multichannel Equalization

Fixed •  To compensate small environments Adaptive •  To consider variable environments Multipoint •  To enlarge the sweet spot with several measures around the listener Frequency domain •  To reduce the computational complexity for real time approaches

(10)

Wave

‐

field Synthesis and Analysis

Audio Rendering

Reproduction systems, based on stereo or multichannel technique, are designed to obtain an optimal acoustic sensation in only one point of the environment (sweet spot).

WAVE FIELD SYNTHESIS

(WFS)

implements sound fields reproduction

WAVE FIELD ANALYSIS

(WFA)

(11)

MMSP 2014

Digital Effects : Reverberation

It is probably the most used audio effect employed by musician during live

performances and recording session.

HYBRID REVERBERATOR: based on a combined approach that use

 

measured impulse responses for the early reflection

 

synthetic IR for the late reflections

(12)

3D Audio

Audio Rendering

Advanced Audio Spatializer

The system is composed of two

parts:

•

a sound rendering system based on

a crosstalk canceller

•

a listener position tracking system

(13)

MMSP 2014

Acoustic Echo Cancellation

Audio Rendering

Stereo acoustic echo cancellers (SAECs) have become essential after spreading

of multichannel systems, introduced for ensuring higher realistic performance in

terms of speaker localization.

DECORRELATION

is used to weaken the linear relationship between the

two input channels must be introduced in order to obtain good echo

cancellation

.

(14)

Active Noise Cancellation

Audio Rendering

It is based on sound field modification by destructive wave interference, i.e.,

principle of superposition

•

A real time feedback system

applied to a real noise recorded in

a yatch

•

Quiet zone close to the pillows :

microphones and loudspeakers

positioned near the bed

(15)

MMSP 2014



Distributed Speech-based Systems for

Smart Homes



Pre-processing Framework for

Speech-interfaced Systems

(16)

Distributed Speech-based

System for Smart Homes



Main Issues

–  Distributed system for recognition of building automation vocal commands and of

distress calls for emergency state detection.

–  Two functional Units: CMPU (Central Management and Processing Unit) and

LMCU (Local Multimedia Control Unit)



ITAAL corpus

–  20 people involved(10 men and 10 women)

–  Headset & Distant Microphones

(17)

MMSP 2014

Distributed Speech-based

System for Smart Homes

LMCU

w/ Vocal Effort Classifier



Advancements

–  Small-vocabulary speech recognizers

(based on the i-vectors paradigm)

–  Vocal Effort Classification (see

figures)

–  Seamless integration of Sound

Identification and Novelty Detection module GMM Training Neutral corpus (APASCI) Neutral templates Shout templates Supervectors Extraction SVM Training UBM SVM Classification Model Speech Vocal effort

(18)

Speech-interfaced Tabletop

  Fostering groups conversations by

visualizing suitable stimuli on the tabletop display

  Stimuli can be floating words and/or

pictures. Stimuli are related to the topic of the discussion

  Topics are obtained by capturing

spoken keywords

  Perception: captures the ongoing situation

around the table (status of the system, conversation keywords).

  Interpretation: draws the topic of the

conversation based on recognized keywords and predefined topics.

(19)

MMSP 2014

Pre-processing framework for

  Pre-processing framework

composed by three cooperating module in cascade

  Speaker Diarization: it pilots

the other two stages informing them who is speaking.

  Blind Channel Identification:

the source-microphone Irs are blindly identified when one single speaker is active.

  Speech Dereverberation:

reverberation is compensated directly on the SIMO systems obtained from the original MIMO one and original sources are yielded as output.

  Noise robust implementation.

Feature Extraction GMM Training Training Recognition Models ) (k xn Speaker Diarization Speech Dereverberation BCI ) ( 1 k x ) (k x_n ) (k xN 1 P PM ) ( ˆ k sM ) ( ˆ₁ k s h

Overall Framework

Speaker Diarization

(20)

Speech-Reinforcement

  Speech reinforcement (SR)

techniques aim to increase the speech intelligibility in adverse environment where the

communication is difficult.

  SR system: composed by one

microphone, an amplifier and a loudspeaker at least.

  Acoustic Feedback occurrence

due to the acoustic coupling between the microphone and the loudspeaker.

  Suitable algorithms are needed:

PEM-AFROW based solutions adopted in this case.

  Implementation on embedded

systems and application in real environments.

(21)

MMSP 2014

Speech-Reinforcement

 

Application to the automotive

dual-channel communication scenario

 

Two Acoustic Feedback and Echo

Cancellation problems to solve

(22)

Digital Music



Music Information Retrieval



Digital Music Effects



Music Synthesis

(23)

MMSP 2014

Music Information Retrieval

 

Acoustic Onset Detection

–  Data-driven algorithm developed in

collaboration with Technical University of Munich (Germany)

–  Hybrid Feature Extraction module based on

linear prediction in the wavelet domain and MFCCs

–  Detector based on Bidirectional Long Short

Term Memory Recurrent Neural Networks

–  Improvements with the recent SotA

Feature Extraction (WPEC, ASF) Neural Nets (RNN, BRNN, LSTM, BLSTM) Threshold Peak-Picking x[n] FN,M ODF Onsets Framing / Windowing DWPT coif5, dec_level=8 Nbands=25 Band Energy Compute Logarithm WPEC 25 25 Delta win=2 25 WPEC 0 00

}

WPEC x[n] Forget Gate Output Gate Input Input Gate • • • 1.0 Output Memory Cell

(24)

Digital Music Effects

 

Virtual Acoustic Feedback

–  In collaboration with Aalto University (Finland)

–  Nonlinear Digital Oscillator with a second-order peaking filter in the feedback path

–  Pitch tracking algorithm (SNAC) included to adaptively select the input tone

–  Wave Digital Triode nonlinearity included to improve realism

–  Advancements: rise-time, compressor, smoothing, gain pedal

(25)

MMSP 2014

Digital Music Effects

 

Ibrida

–  PureData tool for sound

hybridization

–  Wavelet domain based

–  Dynamic morphing driven by

automatic onset detection

–  OSC controllable

 

Speech-driven wah-wah

effect

–  Tuning the wah-wah effect by

means of voice commands

–  Low-complexity speech feature

extraction

–  Implementation on commercial

(26)

Music Synthesis

 

Physical Model of the Clavinet

–  In collaboration with the Aalto University

(Finland)

–  Recording and analysis of the different issues

(tones spectral characteristics, attack and decay, inharmonicity, spectrum ripple, beating, amplifier and tone switches)

–  Digital Wave Guide based computational

(27)

MMSP 2014

Wireless Music

 

Wireless MUsic Studio (We-MUST)

–  HW/SW platform for wireless music

–  Based on the PureData and Jacktrip

open-source SWs

–  Latency down to 4ms single-link

–  Developments are currently on-going

(automatic device discovery and adaptive resampling)

 

Application example

–  BeagleBoards (BB) are used to process

and send/receive the audio streams

Beagle

Board