Scenario
Today Mr. Park will work on developing sound and word recog-nition skills with his EFL learners. He announces that they will do a “Discovery Listening” activity today, using a text about the role of pets in people’s lives. After clarifying the topic, he plays the text, delivered at normal speed, and the learners listen with-out taking any notes. The learners self-assess their level of com-prehension (e.g., 40%, 60%, 80%), note it on their worksheets, and then listen to the text a second time. This time they write down all key words. The learners listen a third time to expand their notes by adding as many details as possible.
Mr. Park now asks the class to work together in small groups to reconstruct the text. The members of each group use the notes from their worksheets to reconstruct the text in writing as closely as possible to its original form. The learners share what they have heard, pool together the bits of information understood, and discuss problems they encountered and gaps in understanding. As they share information and resolve com-prehension differences, the learners focus on specific words and important grammatical details to reconstruct a text that repre-sents the combined effort of the group.
When the groups have completed the task, Mr. Park asks them to compare their reconstructed version with a transcript of the original. He asks them to closely examine their errors and deter-mine the cause of those errors, according to the categories on the worksheet (e.g., couldn’t hear the sound, couldn’t separate sound into words, new word to me, etc.). He encourages the learners to assess the relative seriousness of their errors in relation to the overall meaning of the text, and to pay attention to differ-ences between the text and their reconstruction of it. The activ-ity ends with a final listen (without transcript) and the learners
144 A Metacognitive Approach to Listening
Pre-reading Reflection
1. How can this activity help listeners improve sound and word recogni-tion skills?
2. Wilson (2003) claims that this activity helps learners discover their listening problems. How so?
3. To what degree does this activity develop listener metacognition?
How is it similar to or different from the metacognitive sequence outlined in Chapter 6?
4. Why does Mr. Park have his learners do a pre- and post-self-assessment of their comprehension? What does that add to the devel-opment of their listening skills?
Introduction
For comprehension to happen, listeners need to parse the sound stream into meaningful units. This is challenging because the boundaries between words are often hard to determine. A common complaint of L2 listeners is that they have difficulty segmenting meaningful units from the sound stream. The acoustic signal comes so fast and then it is gone. In a study of listening difficulties reported by L2 listeners, eight of the ten problems were related to perception of the acoustic signal and segmenting words from the sound stream (Goh, 2000). This was particularly true for begin-ning- and intermediate-level listeners.
L2 learners also comment that reading is so much easier than listen-ing comprehension. The most obvious difference between the two is the form in which the message is conveyed. Readers have the luxury of spaces that signal boundaries between words and the advantage of being able to return to the text. Listeners, on the other hand, need to do the hard work of segmenting the sound stream into meaningful units, without having the luxury of being able to re-examine the text. This adds to the cognitive burden of listening, compared with reading.
Indeed, the development of perception and word segmentation skills is an essential part of L2 listening development. Tsui & Fullilove (1998) observed that successful listeners need good perception and word segmen-tation skills because prior knowledge is not always adequate to compensate for unknown words in texts that do not always follow schemata precisely.
1 Based on Wilson (2003).
record another self-assessment of overall level of comprehen-sion, comparing this assessment with the level recorded at the beginning of the activity.1
Perception and Word Segmentation Skills 145 This chapter will deal with the important question of how L2 listeners segment speech in the new language they are learning and how teachers can use this information to teach listening. We will begin by examining some of the research literature to better understand the decoding prob-lems faced by listeners and the cues that learners find helpful to segment speech in the target language. The second part of the chapter will examine the unique features of spoken language, the differences between planned and unplanned speech, and the importance of choosing texts that are
“listenable” to facilitate listening development. Finally, the chapter will present a number of techniques and teaching activities that can be used to develop perception and word segmentation skills, to help language learn-ers become better listenlearn-ers.
Word Segmentation: Research Findings Decoding Challenges
It is helpful for teachers to be aware of what research has uncovered with regard to the decoding problems that learners face when they try to parse input in a new language. Three types of problems are summarized by Cross (2009a) as intrusion, processing, and text problems.
First of all, L2 listeners experience intrusion problems from their native language. Research shows that they tend to segment speech involun-tarily on the basis of their L1 segmentation procedures (Cutler, 2001;
Goh, 2000; Graham, 2006). These language-specific habits are acquired early in life and become so solidly engrained in the listener’s processing system that they are involuntarily transferred to listening in a new lan-guage, particularly at the beginning stages. This makes L2 listening par-ticularly difficult when the new language is not rhythmically similar to the listener’s L1.
Second, L2 listeners experience processing problems in that they are unable to rapidly locate word boundaries. In the case of learning English, a stressed syllable appears to be a fairly reliable cue for word onset.
Content words, as opposed to function words, appear to be more salient, likely due to the fact that these words tend to be stressed.
Finally, L2 listeners experience text problems in that they possess inade-quate L2 vocabulary knowledge to quickly recognize words. Furthermore, they are often unable to recognize words they do know when these occur in rapid connected speech, because the form of a word may be altered from its form when spoken in isolation.
Cross analyzed the notes written by learners after each of two listens to news videotexts in a classroom. Based on his analysis, he suggests that the learners in this study could be helped by a greater awareness of the phonetic variations that can occur in connected, spoken English,
146 A Metacognitive Approach to Listening
discrimination of certain sounds, and revision of poor word choices, drawn from other evidence in the text. Other research studies provide teachers with a better understanding of the cues that listeners use to help them deal with an unfamiliar sound stream.
Cues for Word Segmentation
Research shows that listeners use a number of different cues to help them segment a sound stream into meaningful units: semantic/lexical, prosodic, allophonic, and phonotactic cues. Semantic/lexical cues refer to L2 words that listeners may already know and recognize in connected speech, including words from their own L1 that are recognizable orally in the target language. Obviously, these cues play a greater role as proficiency in the target language grows. At the beginning stages of listening, however, listeners are confronted with unknown chunks of speech and will often resort to prosodic cues. Prosodic cues are the stressed syllables, the pauses in the speech stream, and the tone groups between those pauses (Brown, 1990). Allophonic cues refer to different sounds associated with a single phoneme that listeners use for segmentation. Phonotactic cues refer to the specific clusters of consonants and vowels that are characteristic of the target language.
Semantic/Lexical Cues
Sanders, Neville, and Woldorff (2002) conducted some interesting exper-iments to determine the respective roles of semantic, syntactic, and pro-sodic information in segmenting speech. They began by preparing and recording three parallel versions of a sentence: a semantic, syntactic, and acoustic version, each meticulously matched on as many physical characteristics as possible (see Table 8.1). The semantic version was a normal sentence. In the syntactic version, content words were replaced with non-words, retaining only recognizable function words and syntac-tic information, such as –ed endings for past tense verbs or –s endings for plural nouns. Finally, the acoustic version was changed to retain English prosody but with unrecognizable non-words. All three sets of sentences were recorded using identical intonation. Table 8.1 shows the three ver-sions of a strong stress, initial position (bottles).
Both L1 speakers of English and L2 learners of English listened to the sentences and performed a segmentation task. Not surprisingly, the L1 English speakers were best able to detect the targets in the semantic sen-tences, then the syntactic sentences and, finally, the acoustic sentences (see Table 8.1). These listeners used multiple cues, relying more and more on stress and rhythm (acoustic) cues when lexical and syntactic cues were absent.
Perception and Word Segmentation Skills 147
L2 speakers were, understandably, less accurate. They included speak-ers of languages that are rhythmically different from English: Japanese (a mora rhythmic language) and Spanish (a syllable rhythmic language).
Although the advanced proficiency groups for both languages were less accurate than the L1 English speakers, they performed the segmentation task most accurately in the semantic sentences, likely because they had access to multiple cues. On the other hand, there was no difference in per-formance on the syntactic and acoustic sentences, suggesting that these L2 speakers did not attend to syntactic cues to perform the segmenta-tion task. When these listeners had to rely on syntactic and acoustic cues only, they were most successful in identifying words that followed normal English stress patterns. There was a slight difference between the per-formance of Japanese and Spanish L2 speakers, likely due to the influence of stress patterns in their first language.
L1 speakers of Japanese and Spanish also performed the tasks. Not sur-prisingly, given their minimal acquaintance with any English words, these participants performed least well on the semantic sentences. The Japanese L1 speakers performed slightly better on the syntactic sentences than the acoustic versions. There was not much difference in the performance of the Spanish L1 speakers on the syntactic and acoustic sentences.
A more recent study by Lee and Cai (2010) on unfamiliar word process-ing arrived at similar results for the salience of semantic cues. This was particularly true for higher proficiency learners. Prosody cues were used more by the lower proficiency learners.
In sum, not surprisingly, semantics appears to be the most salient cue in perception and word segmentation. Furthermore, based on other reviews of similar research (e.g., Field, 2008b), we can affirm that L2 learners make little use of syntactic cues. This corroborates research on the role of syntactic knowledge in listening performance (see Chapter 4).
Table 8.1 Performance on Segmentation Tasks by Cue Type (based on Sanders et al., 2002)
Cue type Sample sentence L1 Japanese Spanish
speakers speakers speakers
L2 L1 L2 L1
Eng Eng
Semantic In order to recycle bottles you 1 1 3 1 3 have to separate them.
Syntactic In order to lefatal bokkers you 2 2* 1 2* 1*
have to thagamate them.
Acoustic Ah ilgen di lefatal bokkerth ha 3 2* 2 2* 1*
maz di thagamate fon.
Legend: 1 = most accurate; 3 = least accurate; *no difference
148 A Metacognitive Approach to Listening
The prominence of semantic cues points to the importance of instruc-tion in both lexical knowledge and word recogniinstruc-tion skills for the L2 listener.
Prosodic Cues
In the absence of semantic cues, prosodic features of spoken language, such as stress and tone groups, become increasingly salient for deter-mining word boundaries. Some research shows that calling attention to these features is helpful to L2 listeners. Cutler (2001) proposed that, when listening to English, listeners use a metrical segmentation strat-egy: that is, a stressed syllable will most likely signal the beginning of a new word. Cutler concluded this from extensive research in oral lan-guage processing, based on earlier seminal work by Brown (1990). In fact, based on analysis of a corpus of spoken English, Cutler calculated that 85.6 percent of content words in speech contain only one sylla-ble or are stressed on the first syllasylla-ble (Cutler & Carter, 1987). This would make stress, listening for strong syllables, a fairly reliable cue for detecting word onset in listening to English. In similar research, Harley (2000) found that L2 learners of English with two quite different first languages (Polish and Chinese) paid attention to prosodic rather than syntactic cues in listening to English, regardless of the age and language background of the listeners. These results corroborate the findings of Sanders et al. (2002) with regard to the minimal role of syntactic cues and the importance of stress in the absence of semantic cues for L2 learners.
Allophonic Cues
A single phoneme may be produced in different ways, depending on its position within a word. For example, the phoneme t is aspirated in “keeps talking” but is unaspirated in “keep stalking.” These are allophones, and allophonic cues, like prosody, are language-specific and may not be per-ceptible by all learners of English. For example, Ito and Strange (2009) found that Japanese learners of English had difficulty exploiting allo-phonic cues for word segmentation purposes; however, their ability to perceive and use aspiration and glottal stops (e.g., separating “ice cream”
from “I scream” in English) improved with increased length of residence in an English-speaking environment. Altenberg (2005) found similar dif-ficulties for Spanish learners of English. Knowing that L2 listeners can learn to override the segmentation cues of their first language and to use allophonic and prosodic cues of the target language to successfully seg-ment continuous speech suggests that these processes are amenable to instruction (Cutler, 2001).
Perception and Word Segmentation Skills 149 Phonotactic Cues
Each language has its own phonotactic constraints: that is, certain sound sequences cannot appear in a syllable or may only appear at either the beginning or end of that syllable. For example, in English the cluster rt can appear at the end of a syllable such as shirt but cannot be the onset of a word, whereas the opposite is true of cr as in crust. In order to test the degree to which L2 learners make use of this information in word segmentation, researchers use a word spotting task. Participants listen to a stream of sound that is nonsense and identify any target language words they hear. The same word is presented in different acoustic con-texts, each representative of the language of interest. Responses are docu-mented for both speed of reaction time and accuracy of recognition. Of particular interest to the researcher is the degree to which the phonotactic constraints of L1 facilitate or interfere with the constraints of L2.
As expected, research shows that listeners are more accurate in spotting words with boundaries that are prevalent in their L1. In their study of German learners of English and L1 English speakers, Weber and Cutler (2006) determined that, where word boundaries in English and German were similar, both listener groups performed equally well in identifying the English target word. When word boundaries were in the English condition only, both groups scored well; however, when words were in the German boundary condition, only the German group scored well. The English group had difficulty, even though the target word was English. Weber and Cutler conclude that this is good news for L2 learners: L2 listeners can approximate the word segmentation strategies of L1 listeners, although it is not clear how much experience with the target language is necessary in order for L2 listeners to suppress L1 probabilities when listening to L2.
Building on this study, Al-jasser (2008) noted similar effects for L1 speak-ers of Arabic and conducted an eight-week study to teach them English phonotactics. After instruction, these L2 learners of English were able to more quickly detect target words in the English boundary condition.
Factors in Word Recognition
Word recognition involves an interwoven process of word segmentation and word activation (Rost, 2005). As listeners identify word boundaries, they attempt to activate a likely word candidate. How does the listener arrive at the best match? Based on his review of the literature, Cross (2009b) identified five interrelated factors that affect speed and accuracy in finding the best fit for a word: context and co-text of the utterance, density (number of potential competing words), frequency of occurrence of the word in the target language, recent activation of the word by the listener, and spreading activation of a network of associated words.
150 A Metacognitive Approach to Listening
Probably the most important cue for the listener is the context of the utterance. Many segmentation challenges (e.g., “ice cream”/“I scream”) are easily resolved by the larger context within the oral text, the co-text (what has been understood so far of the whole text) or the context in which the utterance is spoken. However, it is important to distinguish between lexical, syntactic or semantic contexts, and to consider how and when context has its effect.
Other factors are also important. Density is an important factor in that the onset of some words will activate many candidates. Some words take much longer to differentiate from other potential candidates; for example, a word beginning with a consonant cluster such as scr will be more quickly resolved than a word beginning with a consonant/vowel combination such as li. Words that occur more frequently in the target language are recognized more quickly and accurately. Not surprisingly, this includes mostly function words such as the and it, prepositions such as to and with, pronouns such as you and his, and content words such as hot and make. In the same vein, words that have most recently been activated by the listener, because there is still a trace in memory, will be more quickly activated.
Finally, activation of a network of topic-related words makes it easier to activate the correct word. This spreading activation of a network of words explains why contextualization before listening is so important.
Discussion of the topic or reading something about the topic activates a network of words that can be more quickly accessed in a subsequent listening activity. It also explains why active prediction of words and/or content, as practiced in the metacognitive pedagogical sequence, is even more effective.
Word Segmentation: Synthesis
What can we learn from these studies on the problems and the cues listen-ers use to segment connected speech? Two types of research provide use-ful background for teachers. The research on specific cues and factors in word recognition, done in highly controlled conditions, provides techni-cal insights and remediation suggestions. However, laboratory conditions in these studies rob the listeners of broader contextual support and there-fore lack ecological validity for real-life listening. They also encourage word-for-word listening, which is a less productive strategy in real life.
Research that focuses on identifying decoding errors, on the other hand,
Research that focuses on identifying decoding errors, on the other hand,