Foreword p. xiii
Preface p. xvii
Contributors p. xix
Acronyms p. xxi
Fundamentals of Computational Auditory Scene Analysis p. 1
Human Auditory Scene Analysis p. 2
Structure and Function of the Auditory System p. 2
Perceptual Organization of Simple Stimuli p. 4
Perceptual Segregation of Speech from Other Sounds p. 5
Perceptual Mechanisms p. 8
Computational Auditory Scene Analysis (CASA) p. 11
What Is CASA? p. 11
What Is the Goal of CASA? p. 12
Why CASA? p. 13
Basics of CASA Systems p. 14
System Architecture p. 14 Cochleagram p. 15 Correlogram p. 19 Cross-Correlogram p. 21 Time-Frequency Masks p. 22 Resynthesis p. 23 CASA Evaluation p. 25 Evaluation Criteria p. 25 Corpora p. 26
Other Sound Separation Approaches p. 28
A Brief History of CASA (Prior to 2000) p. 30
Monaural CASA Systems p. 30
Binaural CASA Systems p. 34
Neural CASA Models p. 35
Conclusions p. 36 Acknowledgments p. 36 References p. 37 Multiple F0 Estimation p. 45 Introduction p. 45 Signal Models p. 46 Single-Voice F0 Estimation p. 47 Spectral Approach p. 48 Temporal Approach p. 50 Spectrotemporal Approach p. 53 Multiple-Voice F0 Estimation p. 55 Spectral Approach p. 56
Temporal Approach p. 57 Spectrotemporal Approach p. 59 Issues p. 61 Spectral Resolution p. 61 Temporal Resolution p. 62 Spectrotemporal Resolution p. 63
Other Sources of Information p. 64
Temporal and Spectral Continuity p. 64
Instrument Models p. 65
Learning-Based Techniques p. 67
Estimating the Number of Sources p. 68
Evaluation p. 69
Application Scenarios p. 70
Conclusion p. 71
Acknowledgments p. 72
References p. 72
Feature-Based Speech Segregation p. 81
Introduction p. 81
Feature Extraction p. 83
Pitch Detection p. 83
Onset and Offset Detection p. 83
Amplitude Modulation Extraction p. 85
Frequency Modulation Detection p. 88
Auditory Segmentation p. 90
What Is the Goal of Auditory Segmentation? p. 90
Segmentation Based on Cross-Channel Correlation and Temporal Continuity p. 92
Segmentation Based on Onset and Offset Analysis p. 93
Simultaneous Grouping p. 97
Voiced Speech Segregation p. 97
Unvoiced Speech Segregation p. 102
Sequential Grouping p. 106
Spectrum-Based Sequential Grouping p. 108
Pitch-Based Sequential Grouping p. 108
Model-Based Sequential Grouping p. 109
Discussion p. 110
Acknowledgments p. 111
References p. 111
Model-Based Scene Analysis p. 115
Introduction p. 115
Source Separation as Inference p. 115
Aspects of Model-Based Systems p. 125
Constraints: Types and Representations p. 126
Fitting Models p. 130
Generating Output p. 136
Discussion p. 139
Unknown Interference p. 139
Ambiguity and Adaptation p. 140
Relations to Other Separation Approaches p. 141
Conclusions p. 143
References p. 143
Binaural Sound Localization p. 147
Introduction p. 147
Physical and Physiological Mechanisms Underlying Auditory Localization p. 148
Physical Cues p. 148
Physiological Estimation of ITD and IID p. 150
Spatial Perception of Single Sources p. 152
Sensitivity to Differences in Interaural Time and Intensity p. 152
Lateralization of Single Sources p. 152
Localization of Single Sources p. 153
The Precedence Effect p. 154
Spatial Perception of Multiple Sources p. 155
Localization of Multiple Sources p. 155
Binaural Signal Detection p. 156
Models of Binaural Perception p. 158
Classical Models of Binaural Hearing p. 158
Cross-Correlation-Based Models of Binaural Interaction p. 160 Some Extensions to Cross-Correlation-Based Binaural Models p. 164
Multisource Sound Localization p. 168
Estimating Source Azimuth from Interaural Cross-Correlation p. 169
Methods for Resolving Azimuth Ambiguity p. 172
Localization of Moving Sources p. 175
General Discussion p. 175
Acknowledgments p. 177
References p. 178
Localization-Based Grouping p. 187
Introduction p. 187
Classical Beamforming Techniques p. 188
Fixed Beamforming Techniques p. 188
Adaptive Beamforming Techniques p. 189
Independent Component Analysis Techniques p. 190
Location-Based Grouping Using Interaural Time Difference Cue p. 191 Location-Based Grouping Using Interaural Intensity Difference Cue p. 199 Location-Based Grouping Using Multiple Binaural Cues p. 200
Discussion and Conclusions p. 202
Acknowledgments p. 202
References p. 203
Reverberation p. 209
Introduction p. 209
Effects of Reverberation on Listeners p. 211
Speech Perception p. 211
Sound Localization p. 213
Source Separation and Signal Detection p. 215
Distance Perception p. 219
Auditory Spatial Impression p. 219
Effects of Reverberation on Machines p. 220
Mechanisms Underlying Robustness to Reverberation in Human Listeners p. 224 The Role of Slow Temporal Modulations in Speech Perception p. 224
The Binaural Advantage p. 225
The Precedence Effect p. 226
Perceptual Compensation for Spectral Envelope Distortion p. 228
Reverberation-Robust Acoustic Processing p. 229
Dereverberation p. 229
Reverberation-Robust Acoustic Features p. 233
Reverberation Masking p. 235
CASA and Reverberation p. 237
Systems Based on Directional Filtering p. 237
CASA for Robust ASR in Reverberant Conditions p. 239
Systems that Use Multiple Cues p. 241
Discussion and Conclusions p. 242
Acknowledgments p. 244
References p. 244
Analysis of Musical Audio Signals p. 251
Introduction p. 251
Music Scene Description p. 252
Music Scene Descriptions p. 253
Difficulties Associated with Musical Audio Signals p. 255
Estimating Melody and Bass Lines p. 256
PreFEst-front-end: Forming the Observed Probability Density Functions p. 258 PreFEst-core: Estimating the F0's Probability Density Function p. 258 PreFEst-back-end: Sequential F0 Tracking by Multiple-Agent Architecture p. 262
Estimating Beat Structure p. 267
Estimating Period and Phase p. 268
Dealing with Ambiguity p. 270
Using Musical Knowledge p. 271
Estimating Chorus Sections and Repeated Sections p. 275
Extracting Acoustic Features and Calculating Their Similarity p. 278
Finding Repeated Sections p. 281
Grouping Repeated Sections p. 282
Detecting Modulated Repetition p. 284
Selecting Chorus Sections p. 285
Other Methods p. 285
Discussion and Conclusions p. 286
Importance p. 286
Evaluation Issues p. 287
Future Directions p. 288
References p. 289
Robust Automatic Speech Recognition p. 297
Introduction p. 297
ASA and Speech Perception in Humans p. 299
Speech Perception and Simultaneous Grouping p. 299
Speech Perception and Sequential Grouping p. 302
Speech Schemes p. 306
Challenges to the ASA Account of Speech Perception p. 309
Interim Summary p. 310
Speech Recognition by Machine p. 311
The Statistical Basis of ASR p. 311
Traditional Approaches to Robust ASR p. 313
CASA-Driven Approaches to ASR p. 315
Primitive CASA and ASR p. 316
Speech and Time-Frequency Masking p. 316
The Missing-Data Approach to ASR p. 318
Marginalization-Based Missing-Data ASR Systems p. 321
Imputation-Based Missing-Data Solutions p. 325
Estimating the Missing-Data Mask p. 328
Difficulties with the Missing-Data Approach p. 330
Model-Based CASA and ASR p. 333
The Speech Fragment Decoding Framework p. 334
Coupling Source Segregation and Recognition p. 337
Discussion and Conclusions p. 340
Concluding Remarks p. 343
Neural and Perceptual Modeling p. 351
Introduction p. 351
The Neural Basis of Auditory Grouping p. 352
Theoretical Solutions to the Binding Problem p. 352
Empirical Results on Binding and ASA p. 353
Models of Individual Neurons p. 354
Relaxation Oscillators p. 354
Spike Oscillators p. 355
A Model of a Specific Auditory Neuron p. 357
Models of Specific Perceptual Phenomena p. 359
Perceptual Streaming of Tone Sequences p. 359
Perceptual Segregation of Concurrent Vowels with Different F0s p. 367
The Oscillatory Correlation Framework for CASA p. 372
Speech Segregation Based on Oscillatory Correlation p. 372
Schema-Driven Grouping p. 376
Discussion p. 378
Temporal or Spatial Coding of Auditory Grouping p. 379
Physiological Support for Neural Time Delays p. 379
Convergence of Psychological, Physiological, and Computational Approaches p. 380
Neural Models as a Framework for CASA p. 380
The Role of Attention p. 381
Schema-Based Organization p. 381
Acknowledgments p. 381
References p. 381
Index p. 389