2.4 Spatial release from masking in speech perception
2.4.3 Spatial release from informational masking
Spatial release from informational masking occurs when a perceived difference in spa- tial attributes provides the listener with additional information about the identity of the target and/or the masker. When there exist other robust cues that can also differentiate the target speech, such as differences in pitch, vocal-tract length, or in- tonation, adding a spatial separation may not result in a significant improvement in intelligibility (Darwin & Hukin, 2000a). If other attributes are similar, however, spatial separation can provide a robust cue to segregate the speech mixture. The ben- efit of spatial separation in a speech intelligibility experiment with the CRM stimuli
with two competing, same-sex talkers positioned symmetrically at ±90◦
0◦ target talker amounts to approximately 12 dB relative to the frontal co-located condition (Marrone, Mason, & Kidd., 2008).
An example that illustrates the different roles of spatial separation in alleviating the effects of energetic and informational masking in speech perception is presented in the study by Freyman et al. (1999). The experiment utilized stimuli consisting of lead-lag stimulus pairs separated by 4 ms. The lead and lag stimuli were presented
from two loudspeakers located at 0◦
and +60◦
. In the “FR-FR” condition, the lead stimuli of the target and the masker were presented from the front loudspeaker. 4-ms following the lead, the lag stimuli were presented from the right (+60◦) loudspeaker. In the “FR-RF” condition, the target’s lead stimulus was presented from the front loudspeaker and its lag was presented from the right, while the masker’s lead stimu- lus was presented from the right and its lag was presented from the front. Switching the lead and lag loudspeakers did not appreciably affect the long-term spectra of the lead-lag stimuli. However, due to the precedence effect, the FR-stimulus was per- ceived to be at the front while the RF-stimulus was perceived to be at the right. Using this experimental configuration, the intelligibility of female speech was mea- sured in the presence of a masker. The masker was either a speech-shaped noise (energetic masking) or another speech sample spoken by a different female talker (in- formational masking). Consistent with the prediction that energetic masking can be mainly explained by the amount of overlap in the target’s and masker’s energy, the speech intelligibility in the FR-FR condition did not differ significantly from that in the FR-RF condition when the speech-spectrum noise acted as a masker. For the speech-on-speech masking, however, the intelligibility in the FR-RF condition was significantly better than that in the FR-FR condition.
Hawley, Litovsky, and Culling (2004) examined the effects of increasing the num- ber of interferers on spatial release from energetic and informational masking. Speech- spectrum noise and speech from the same talker as the target were used to represent energetic and informational masking, respectively. Adding a number of interferers, particularly when they were distributed more evenly around the listener, increased the masker energy at the better ear as well as decorrelating the binaural input, reducing both the better-ear advantage and binaural squelch. Spatial release in speech-on- speech masking, however, was robust in all spatial configurations tested (all maskers in the same position or spatially distributed across the hemifields); the proximity of the target to the interfering sources played a more important role than the number of interferers.
Spatial release from informational masking can be partly attributed to the role of spatial cues in selective attention. Attention can either be driven by salient attributes of the selected object (stimulus-driven) or be volitionally directed based on the task demand (task-driven). To examine the role of spatial attention in resolving speech-on- speech masking, Kidd, Arbogast, Mason, and Gallun (2005) manipulated listeners’ expectation of the target location in a multi-talker speech perception task. Three CRM sentences spoken by different same-sex talkers were simultaneously presented to the listeners at equal level. The talkers were located at -60◦
, 0◦
, and +60◦
, and the target sentence was presented from one of these locations with various degrees
of certainty. In the condition with 100% certainty, p = 1, the target was always
presented at the same location (-60◦
, 0◦
, or +60◦
), while in the condition with the least certainty, p= 0.33, the target location in each trial was chosen randomly with
all target locations, indicating that each sentence could be fully segregated from the mixture. Consistent with the ability to direct attention, a decline in performance was observed with decreasing certainty in target location, with performance reaching 67% on average in the least certain condition.
Selective attention is comprised of two different aspects, namely reorientation and maintenance (Kahneman, 1973, pp. 119). Maintenance of attention is associated with the ability to continuously attend to the selected message; for example, in a task in which the listener is required to attend and verbally repeat a message played to one ear while ignoring the message played to the other ear, e.g., (Cherry, 1953). Reorientation of attention is associated with a switch in the identity of the target message; for example, when a target speech is played with random switches from one ear to the other. It is known that task performance is impaired following an attentional switch but that it improves over time (Kahneman, 1973, pp. 119). That is, it takes some time for selective attention to build up. Furthermore, continuity in the attended attribute over time facilitates refinement in the spatial tuning of selective attention.
The build up of spatial selective attention and the cost of switching attention can be illustrated in the study of Best, Ozmeral, Kopˇco, and Shinn-Cunningham (2008). In this study, listeners were required to identify a sequence of target digits; each digit was presented simultaneously with 4 other spatially-separated competing digits. The location of the target digits were fixed or randomized. A set of LED lights informed the listeners of the location of each of the target digits, either synchronously with or in advance (up to 1 s) of the presentation of the digits. The results showed that listeners performed better in the fixed target location condition (i.e., more correct digits were
recalled). Providing an advanced visual cue of the target location reduced, but did not eliminate the cost of switching location.
The tuning of spatial release from informational masking is explored by Marrone et al. (2008). In this experiment, speech intelligibility in a multi-talker situation was tested with the CRM stimuli. Three CRM sentences spoken by different same-sex talkers were presented simultaneously to the listener. The target talker was always positioned at 0◦
, and the two maskers were symmetrically positioned to the left and right of the target talker at ±15◦, ±45◦, ±90◦(spatially separated condition), or
with both maskers co-located with the target at 0◦. Spatial release from masking
was measured as the difference between the speech reception thresholds in the co- located and spatially separated conditions. Results from the study indicated that the benefit of spatial separation increased with the amount of spatial separation. The benefit was particularly apparent when the spatial separation was small, but the increase in benefit became smaller with increasing separation. The amounts of spatial release from masking were 8.7 dB, 12.3 dB, and 12.6 dB for 15◦, 45◦, and 90◦ spatial separation, respectively. That is, the largest increase in masking release was obtained
when the spatial separation was increased from 0◦
to 15◦
, and no further significant
increase was obtained by increasing the spatial separation from 45◦
to 90◦ .