Data collection: creating sub-corpora

3. Data and methodology

3.1. Data

3.1.2. Data collection: creating sub-corpora

The study at hand is based on a mixed-methods approach. Following traditional CA methodology, I first broke down the large ICE- and ACE-corpora to create small sub-

corpora of everyday conversations – so-called ‘collections’. These sub-corpora consist of interactions adhering to the following conditions:

a) They involve three or more speakers,

b) they involve both male and female speakers as well as different age groups, c) the recording quality allows for a detailed CA transcription,

d) the conversations are sufficiently long, and

e) the participants do not show an excessive orientation to the recording situation.

Condition a) guarantees that the effect of the observer’s paradox (Labov 1970: 32) on the interaction will be minimised – the more participants, the less they will focus on the recorder. Intelligibility poses a natural limit on the number of interactants, as more speakers usually result in split conversations which impede transcription. Most of the interactions I investigated thus involve 3-5 participants. Table 3.1 gives an overview of the conversations analysed and their respective numbers of participants.

Table 3.1: Number of interactants in the conversations analysed (ACE, ICE-JA, ICE-T&T) Corpus Conversation Number of

interactants (individual convers.) Total no. of interactants (sub-corpus) ACE SG_ED_con_4 3 13 SG_ED_con_6 3 VN_LE_con_pho restaurant 7 ICE-JA S1A-003 3 40 S1A-004 5 S1A-006 3 S1A-010 5 S1A-013 4 ICE-T&T S1A-008 4 S1A-034 3 S1A-050 5 S1A-057 3 S1A-067 5

As gender and age effects are not a primary focus of my analysis, I included both mixed and single-sex interactions (all-male and all-female) in the sub-corpora, and participants are from a variety of different age groups. Obviously, the data is biased to a certain extent – there are more female speakers than male – as can be seen in table 3.2.

Table 3.2: Sociolinguistic background of the conversations analysed (ACE, ICE-JA, ICE-T&T) Corpus Gender Conversational setting Age span

covered13

male female mixed single-sex

ACE 3 10 2 1 (all-female) 20-50 years

ICE-JA, ICE-T&T 15 25 4 2 (all-male), 514_(all-female) _18-66+15_years

This bias is largely due to two factors: First, the corpora themselves often include more female speakers and more all-female conversations (cf. the overview in Rosenfelder et al. 2009 on ICE-JA); and second, as my data had to fulfil several criteria, the amount of interactions suitable for analysis was certainly limited.

In order to create detailed CA transcripts and to capture potentially relevant prosodic features as completely as possible, it is essential to have audio files with good recording quality. Unfortunately, this meant that I had to exclude a number of highly interesting and lively interactions from my analysis because of noise or poor sound quality. Some of the conversations which are included in my sub-corpora still pose some problems for transcription, e.g. because the recording took place in a noisy environment like a restaurant, or because of split and background conversations, which impede the data’s intelligibility (cf. VN_LE_con_pho restaurant in the Southeast Asian data). However, these noises arise out of the data’s nature and are a sign of the interaction’s naturalness – they therefore should be regarded as positive if not desirable, even if they obviously complicate the transcription process.

Condition d) ensures that I can check for speakers’ idiosyncrasies, as observing the participants’ behaviour over a longer period of time allows me to detect individual preferences, which might influence the analysis of the whole conversation. This is particularly relevant for studies with a qualitative focus and a small data base, i.e. in situations where the effects of individual speakers cannot be mitigated by comparing them to a large control group. Using longer interactions enables me to spot speaker-specific patterns, such as fast tempo, and to take them into account when analysing the conversations. Table 3.3 sums up the durations of the individual conversations as well as the sub-corpora.

13_{Unfortunately, ACE and ICE use different age spans to group their participants, which does not allow for a}

more detailed comparison.

14_{Conversation S1A-067 in ICE-T&T starts as a mixed interaction and ends as an all-female conversation. It}

is therefore mentioned in both columns.

Table 3.3: Length of individual interactions and duration of sub-corpora Corpus Conversation Duration (individual

convers.) Total duration (sub-corpus)

ACE SG_ED_con_4 01:00:59 03:02:15 SG_ED_con_6 01:02:56 VN_LE_con_pho restaurant 00:58:20 ICE-JA S1A-003 00:12:31 01:50:22 S1A-004 00:11:07 S1A-006 00:12:32 S1A-010 00:13:50 S1A-013 00:09:22 ICE-T&T S1A-008 00:11:18 S1A-034 00:09:23 S1A-050 00:07:17 S1A-057 00:11:30 S1A-067 00:11:21

As table 3.3 shows, the interactions taken from ACE are longer, i.e. speaker-specific preferences are easier to spot and can be taken into account while analysing the conversations as such. Throughout SG_ED_con_4, for instance, participant Zhi’s speaking style is marked by slower pace than that of her co-conversationalists, and she also tends to allow for longer pauses. Consequently, while situations such as (3.1) and (3.2) (taken from the beginning and the end of the one-hour conversation) are characteristic of Zhi’s personal way of talking, they do not reflect the interaction as such.

(3.1) From China (ACE, SG_ED_con_4)

01 Zhi: ((alveolar click))

02 (0.7)

03 Zhi: °I (1.2) come from China (0.3) and er° ((alveolar click))

do my (0.3) er bachelor and master (1.0) the learning (.) period (0.4) er in China

(3.2) PhD (ACE, SG_ED_con_4)

01 An: =so how many y(.)ears is he into his p h d?

(0.3)

02 Zhi: ((alveolar click))(0.3) erm:::

03 (0.8)

04 An: [(deep within)?]

05 Zhi: [I ↑think] (0.2) erm:: (0.2) <↑he still need>

07 Zhi: <almost er TWO ↑years>

Choosing longer conversations allows me, first of all, to detect these idiosyncrasies, and second, to take them into account when analysing the interaction as such. As Zhi’s talking style will not only be noticeable to the analyst but also to her co-conversationalists, this can explain why none of the other speakers starts up during her seemingly dysfluent utterance in example (3.2). For the Caribbean data group, ICE does not offer single conversations of similar length. However, as more speakers are included in this sub-corpus, idiosyncratic effects will be mitigated as well.

Finally, I only chose conversations where the participants’ orientation to the recording device, the recording situation, or the fieldworker is minimal. Thus, I excluded conversations which openly or repeatedly addressed these factors. Interactions with minor and short mentions of the recording situation, such as shown in example (3.3), were treated as unproblematic and were therefore not excluded from the corpus. Again, this was meant to guarantee that the effect of the observer’s paradox on the data remained as small as possible.

(3.3) Radio (ICE T&T, S1A-057) 01 Tre: <put it ↑on!>

02 (0.4)

03 Kat: >↑no be↑cause:< <then I’d have the> background ↑NOISE! 04 (0.3)

05 Kat: <so I’d to ↑TELL ↑you> 06 (0.3)

07 Kat: <I couldn’t just tell you ↑take off the ra[↓d i o] 08 Tre: [((steups)) (↑(boy)] <↑TAKE off the ra:↑dio nah>

In document Patterns of Conversational Interaction in Varieties of English (Page 32-36)