Cleaning and refinement - Data preparation and analysis

3.2 Data preparation and analysis

3.2.1 Cleaning and refinement

The original transcriptions of the corpus included detailed representations of speakers’ speech properties (e.g., laughs, whispers, vocal noises, inhaling and

exhaling) and other forms of gestural communication (e.g., demonstrating something, snaps his fingers three times). The transcription of the corpus also included a few transcribers’ comments pointing out examples where they were uncertain about what the speakers said or describing speakers’ actions, (e.g., playing around with some kind of toy dog for about 11.8 seconds), or reactions (e.g., disgusted or moaning).

In order to analyse priming in L1 and L2 spoken interaction data, it is essential to determine the distance between potential primes and the subsequent primed

118

production by the same speaker or their partner. As pointed out in section 2.4, Prime- target distance was measured in the literature by recording the time between the initial production of a construction and its subsequent production or by counting the number of turns or sentences that occur between a prime and a target. The L1 and L2 data are highly interactive conversations where turns contain several syntactic units with a lot of variation in their length, and often contain typical aspects of speech, i.e., elliptical and incomplete utterances. Therefore, I adopted Foster et al. ( 2000) AS-units to divide the L1-L1 and L2-L2 conversations into basic syntactic units. Foster et al. (2000, p. 365) define an AS-Unit as: “… a single speaker’s utterance consisting of an independent clause, or sub-clausal unit, together with any subordinate clause (s) associated with either”.

Due to the highly interactive nature of the corpus, many instances of self- corrected forms were observed. These are forms that occur “… when the speaker identifies an error during or immediately following production and stops and reformulates the speech; self-corrections will therefore include an element of

structural change…” (Foster et al., 2000, p. 368). Repetitions and self-corrected forms are normally removed under Foster et al.’s (2000, p. 368) maintaining only the final version of the corrected form.

I will use Figure 3.3 below to further illustrate the treatment of repetitions, self-corrected forms, and all the other steps included in the refinement of the original transcription. The excel sheet to the left is a screenshot of conversation number

English124 was taken before the transcription refinement process. The one to the right includes the same part of the conversation after it was cleaned and divided into AS- units (Foster et al., 2000).

119

Figure 3.3: Conversation English124, before and after transcription refinement

Step 1: Remove repetitions and self-corrected forms if they occurred within the boundaries of a larger AS-unit, e.g., ‘and then ...the g--’, sentence 29. This removal did not affect the distance between primes and targets because I maintained the rest of units where self-corrected forms occurred.

Step 2: Maintain self-corrected forms that occurred as an AS-unit on their own. Lexically identical repetitions of an immediately preceding prime produced by any of the speakers are also maintained as an AS-unit, but excluded as potential primes and targets from the analysis (Fernández & Grimm, 2014, p. 465). This is important because these units count towards the number of units that separate a potential prime from a target, a count that is necessary in considering the distance question. Example (33) below was extracted from the Figure 3.3 to illustrate the treatment of identical lexical repetitions.

(33) English124A: and like you know, hold them up with a gun English124A: hold them up with a gun

120

Speaker (A) in the first unit produces a prime, i.e. a sentence that includes one variant of the verb-particle construction. However, they seem to have repeated the exact same phrase where the prime occurred, as a way of perhaps using time to plan the next utterance they were going to say. The repetition in this case may be taken as a form of disfluency on the part of speaker (A), or perhaps it was intended to carry some rhetorical purpose. It has been argued that the identical repetition of the same variant of a prime may not be the result of priming, but rather a rhetorical discourse factor that induced the repetition of the syntactic construction (Branigan, 1995, p. 492; Costa et al., 2008, p. 535; Reitter, 2008, p. 17).

Step 3: Remove all transcription symbols and disfluency fillers, e.g., ‘erm’, sentence 20 in the original transcript, because they do not constitute constructions that might have affected the speech of either the speakers or the production of target alternations. Step 4: Remove all transcribers’ comments e.g., ‘suppresses a sneeze’, sentence 27 in the original transcript. Transcribers did not take part in the actual task and their comments happen outside the task during the transcription process.

Step 5: Maintain all discourse markers such as ‘kind of’ and ‘sort of’. These were maintained in the original speech but not parsed because they fall beyond the scope of this study. Discourse markers that include verbs, i.e. ‘You know…’, i.e. sentence 19, however, were maintained and parsed in the same method as with all other verb phrases (See section 9.3.2).

Step 6: Maintain minor spelling and grammatical inaccuracies because they do not affect the parsing and are a reflection of what was being transcribed (see Figure 3.4).

121

Figure 3.4:German 31: maintaining minor spelling mistakes

Sentence (139) in Figure 3.4 shows an example of minor spelling mistake. It seems that speaker B wanted to use the word ‘minor’ to describe the characters. However, perhaps they produced an inaccurate pronunciation of the word. I

maintained this mistake because it is minor, and it does not affect the parsing of the sentence.

Similarly, in sentence (164) in Figure 3.5, there is a grammatical mistake where a German speaker used the quantifier ‘many’ which does not correspond to the uncountable noun ‘money’. This kind of minor grammatical errors was maintained because it does not affect the parsing or the analysis.

Figure 3.5: German 68: maintaining minor grammatical errors

Step 7: Remove all small chunks like ‘and her mom’ in Figure 3.3 because they do not make for a single AS-unit on their own. The same transcription refinement process was followed with the L1 and L2 conversations to ensure consistency in the analysis.

To sum up, I maintained the original speech as it is, removed all transcription symbols and rearranged the speakers' turns into AS-units that are relatively equal in length. These steps helped standardize the treatment of the L1-L1 and L2-L2 data by keeping the variation in the size of turns in each conversation transcription to a

122

minimum. The next section will introduce a parsing model to quantify the priming of target structures in L1-L2 and L2-L2 spoken data.

In document Quantifying syntactic priming in oral production:a corpus based investigation into dyadic interaction of L1 L1 and L2 L2 speakers of English (Page 117-122)