The Perception of Multiple Sounds
SEGREGATION OF A SEQUENCE OF SOUNDS INTO TWO SOUND STREAMS
The loudness difference between the two sets of alternating sounds in the previ-ous section was essential for listeners to hear the weaker tone as continuing straight through the louder tones or noise bursts. But what occurs when two tones differing in frequency but approximately equal in loudness are alternated?
Miller and Heise (1950) were the first to report that if the frequency difference is small, the tone pulses are heard as a coherent sequence of tones with the pitch going up and down periodically, but as the frequency difference is made larger, the coherence disappears and the series of tones splits up into two streams, one consisting of the lower frequency tones, the other of the higher frequency tones (see Fig. 3.6).
In a subsequent study (Heise & Miller, 1951), this phenomenon was tested for sequences of eleven 125-msec repeatedly presented tone pulses, with all quencies fixed except the sixth pulse. The listeners were asked to adjust the fre-quency of the sixth pulse to such a value that it just separated perceptibly (i.e., appeared in a different stream) from the other ones. The solid points in Fig. 3.7
FIG. 3.4. The lines represent the harmonics of a tone partly alternated with noise bursts. In panel A, the lower harmonics are presented continuously, in panel B, the lower and higher har-monics are presented alternately.
give three examples of these fixed-pulse sequences, with the separation thresh-old of the variable sixth pulse represented by the open points. These diagrams il-lustrate that hearing all tones as a single stream depends on the frequency pattern of the entire series of tones.
It is evident that perceptual streaming plays an important role in music.
Composers long ago discovered the phenomenon and have often exploited it in compositions. For example, Johann Sebastian Bach depended on listeners’
streaming ability to suggest two melody lines in his Partita Nr. 3 for solo violin in E major.
The phenomenon was also studied extensively by van Noorden (1975) in his doctoral thesis, from which most data discussed next are taken. Figure 3.8
FIG. 3.5. The lines represent the harmonics of a series of tone pulses with varying pitch.
Panel A shows the tones interrupted by noise bands of which the latter four cover transitions from one tone to the next. Despite this temporal uncertainty, the tones are heard as a melody of equally lasting tones. By listening first repeatedly to the tone pulses without noise (panel B) the rhythm heard with noise adapts to this condition.
shows that there is a wide range of frequency differences within which a single coherent sound stream as well as two separated streams may be heard. As the author wrote:
It would seem as if the percepts are mutually exclusive in the perception. When one lis-tens without special attention, one hears first the one percept and then the other. The change-over is then spontaneous, and appears to occur at random moments. (p. 9) FIG. 3.6. Successive tone pulses with a small frequency difference are heard as a single sound stream, while pulses with a large frequency difference are heard as two independent streams.
FIG. 3.7. The panels illustrate that hearing a tone pulse (open symbol) as not belonging to a series of successive tone pulses (solid symbols) depends on the frequency pattern (based on data from Heise & Miller, 1951).
Thus this instance of auditory streaming bears a striking resemblance to “visual streaming” of shapes, such as the faces/vase visual illusion.
Van Noorden found that although the lower boundary of mixed range where both percepts can occur is rather independent of pulse duration, the upper boundary increases considerably for longer tone pulses. Streaming breaks down for tones longer than about 200 msec, which are perceived as independent from each other rather than as a pattern.
The boundary curves in Fig. 3.8 are for repeatedly alternating tone pulses, and represent the optimal condition for hearing two streams. Subsequent data reported by van Noorden indicate that if tone sequences are shorter, the fre-quency difference between tones can be larger and still permit streaming. In the extreme case of only two tone pulses per sequence, a very much larger frequency difference can be tolerated than in the case of multiple pulses. This can be veri-fied easily with 100-msec pulses having a frequency ratio of 2:3; the two tone pulses of a single pair sound as though they belong to each other as a melodic in-terval, but this quality quickly disappears if the pair is repeated over and over without pauses.
This dependence of the stream-segregation boundary on the length of the se-quence can be interpreted as indicating that previous tones play a role in the grouping process by biasing expectations. Beauvois and Meddis (1997) studied this bias effect by presenting listeners with an initial 10 sec of repeated 1,000-Hz tone pulses. Subsequently, after a variable silent period, a sequence of eight tone
FIG. 3.8. Frequency difference for which pulses of a higher tone, alternated with pulses of 1000 Hz, are heard as one stream or as two streams (based on data from van Noorden, 1975).
pulses, alternating between 1,000 Hz and 1,420 Hz, was presented. The results indicated that the alternating pulses were much more frequently heard as segre-gating when presented after a short silent period than when presented after a long period. Apparently, the system’s “conclusion” that the four deviating pulses of 1,420 Hz could not belong to the sequence faded gradually as a func-tion of time.
An unexpected streaming pattern occurs when a sequence of tone pulses with increasing frequency is alternated with a sequence of tones with decreasing frequency (see Fig. 3.9). One might expect that the two sequences would seem to “pass” each other in a similar way as visual patterns do, represented in the left panel by different line thickness. However, van Noorden observed that this is not the case: The melodies do not “cross” but instead segregate into a higher and a lower melody (visualized in the right panel).
In the experiments described so far, sinusoidal tones were used. However, the coherence of tone sequences depends on timbre as well as pitch. A strik-ing demonstration can be given by replacstrik-ing one of the two crossstrik-ing se-quences of pulses with complex tones. Even a single harmonic is sufficient to hear the two sequences as actually crossing (indicated by the lines in the left panel of Fig. 3.9).
Another example of the role of timbre is illustrated in Fig. 3.10. The three panels represent three different conditions investigated by van Noorden. In the case on the left, a sinusoidal tone is alternated with a complex tone comprised of the third to the 10th harmonics of the tone. Both tones have, as we saw in chap-ter 2, the same pitch but they differ considerably in timbre. As a consequence of this difference, these tone pulses are not heard as a coherent stream. The mid-dle panel of Fig. 3.10 illustrates the case where alternating tone pulses represent three harmonics of the same fundamental frequency. Again, the repeated tone pulses are heard as two streams. For the case shown in the right panel, the three pulses differ slightly in fundamental frequency, and consequently in harmonics,
FIG. 3.9. Two sequences of alternating tone pulses crossing each other in frequency are only heard as crossing (panel A) if they differ in timbre; if not, they segregate in a lower and a higher melody (panel B).
but such that the differences in pitch as well as in timbre are small. In this third case the tone pulses fuse into a single stream.
These observations demonstrate clearly that there are rather strict condi-tions under which successive sounds of short duracondi-tions are perceived as a co-herent sound stream. The pitch, timbre, and loudness of segments are very important to the effect—none should differ much from segment to segment.
This certainly suggests that the auditory system is deciding whether or not the successive sounds had their origins in the same source or different sources. Small variations over time may be possible for a single source, but large changes may signal the presence of two (or more) sources.
This highly sophisticated way of processing illustrates again that the auditory system is fully equipped to separate superimposed sounds. It does not accept
“naively” the incoming sounds as a sequence of independent events, as a me-chanical analyzer would do. The goal of perception is clearly aimed at finding structure in the sounds, and grouping them according to criteria that match the nature of sound sources in the real world. Thus we see that sound perception is governed primarily by holistic rather than elementalistic principles.
FIG. 3.10. Panel A represents a sinusoidal tone alternating with a complex tone of the same pitch but different timbre (harmonics 3–10 only); Panel B, two alternating complex tones of the same pitch but different timbre (harmonics 3–5 vs. 8–10 only); Panel C, two alternating complex tones differing slightly in pitch and timbre (both harmonics 5–7 only). Conditions A and B are heard as two streams; condition C heard as one. The dashes represent the (absent) fundamental.
RESTORATION OF SPEECH SEGMENTS