CHAPTER 3 RESEARCH METHODS
3.4 Data Processing for Corpus Analysis
To analyse the information collected through corpus-based approaches, all of the spoken data was transcribed with two major aims in mind: analysing the lexico-grammatical features in depth, and understanding the various pragmatic features within them. For this, verbatim transcription, which details all of the speakers’ utterances including non-meaning conveyance words such as ems, ers and you knows, was conducted first, as a point of departure. The data was then transcribed in more detail, to analyse a wide range of pragmatic features in depth according to VOICE (Vienna Oxford International Corpus of English) transcription and spelling conventions designed for transcribing different English varieties in ELF communication.
3.4.1 Data Transcription (VOICE Transcription Scheme & Spelling Conventions)
The VOICE transcription scheme was designed with three important points of emphasis in mind. These state that the transcription should: ‘capture the reality of spoken interactions as precisely as possible’; ‘be replicable...by other researchers’, and be ‘computer-readable’ (VOICE project, 2007). This detailed and precise transcription scheme seems to make in-
68
depth understanding of pragmatic features possible by allowing researchers to investigate a particular set of linguistic features beyond words, phrases and clauses, and explore their distinctive communicative functions. In addition, the VOICE spelling convention, which is designed to analyse ‘the diversity of ELF speech in a standardized way’ (VOICE project, 2007), has been utilised and partially adapted for this research. Two schemes are summarised in Tables 9 and 10.
Table 9. VOICE transcription scheme utilised for this project (partially adapted)
Categories Example Description
1. SPEAKER IDS
Interviewers I1:
I2:
I for Interviewer
N. for assigned Interviewers’ N. (e.g. 1,2,3..) Interviewees F1-P-ME: F2-S-WA: P1-I-ME: P2-I-WA: F (Fail)/ P (Pass) 1 (Assigned N.) e.g. 1,2,3… P (Name of country) e.g. P for the Philippines
M (Man)/ W(Woman)
A (Engineering)/A (Administration)
2. INTONATION
I1: that’s what my next er slide? does
Words spoken with rising intonation are followed by a question mark “?” I1: that’s point two. Absolutely yes. Words spoken with falling intonation
are followed by a full stop “.”
3. PAUSES
P20-V-ME : I'm:- I feel confident- (.) I (.) that's I can contribu:te in the your company (.)
Every brief pause in speech (up to a good half second) is marked with a full stop in parentheses.
P20-V-ME : Of course, (.) I already know before. (2)
Longer pauses are timed to the nearest second and marked with the number of seconds in parentheses, e.g. (1) = 1 second, (3) = 3 seconds.
4. OVERLAPS P20-V-ME: Uh: I: am <1>for</1>
I1: <1>How</1> many years?
Whenever two or more utterances happen at the same time, the overlaps are marked with numbered tags: <1></1>, <2></2>,…Everything that is simultaneous gets the same
69
I1: it is (.) to identify some<1>thing </1>where (.)
F1-P-ME: <1> mhm </1>
All overlaps are approximate and words may be split up if appropriate. In this case, the tag is placed within the split word.
5. LATCHING
F1-P-ME: yes
I1: <=>really. so it’s it’s quite a lot of time.
Whenever a speaker continues, completes or supports another speaker’s turn immediately
(i.e.without a pause), this is marked by “<=>”.
6. LENGTHENING
I1: <=>Even though: you- your demand is, for example, I need only (.) one hundred [currency1]
Lengthened sounds are marked with a colon “:”.
P20-V-ME: I have assistant member (.) uh: about two (thousand) from: in the mother:: [org2]. (.) It mean [org3] dockyard.
Exceptionally long sounds (i.e. approximating 2 seconds or more) are marked with a double colon “::”.
7. REPETITION I1: e:r i’d like to go t- t- to to this
type of course
All repetitions of words and phrases (including self-interruptions and false starts) are transcribed.
8. WORD FRAGMENTS
I1: <=>And please be in con- (.) confidence. hm? (3)
With word fragments, a hyphen marks where apart of the word is missing.
9. LAUGHTER
F2-P-ME: I just want to know (.) what is the name of your company, <@>sir.</@>
I1: <@>@@@@@</@> You can ask to your agent.
All laughter and laughter-like sounds are transcribed with the @ (i.e. ‘ha’, open laughter) or * (i.e. ‘hm’, throaty laughter) symbol, approximating syllable number (e.g. ha ha ha = @@@).
Utterances spoken laughingly are put between<@></@>tags.
10. UNCERTAIN TRANSCRIPTION
F1-P-ME: I will comply with the (safety) policy (.) On morning, (.)
Word fragments, words or phrases which cannot be reliably identified are put in parentheses ( ).
11.
ANONYMISATION
[P13] [F2/last]
Whenever speakers who are involved in the interaction are addressed or referred to, their names are replaced by their respective speaker IDs. A speaker’s first name is represented by the plain speaker ID in square
brackets [P1], etc. A speaker’s last name is marked [P1/last], etc. If a speaker’s full name is pronounced, the two tags are combined to [F1] [F1/last], etc.
70 [first
name3] [last name3]
Names of people who are not part of the ongoing interaction are
substituted by [firstname1], etc. or [last name1], etc. or a combination of both.
[org2]
Companies and other organisations need to be anonymised as well. Their names are replaced by [org1], etc.
[place12]
Names of places, cities, countries, etc. are anonymised when this is deemed relevant in order to protect the speakers’ identities and their environment. They are replaced by [place1], etc.
[name1]
Other names or descriptors may be anonymised by [name1], etc., as in e.g. Charles University.
12.
UNINTELLIGIBLE SPEECH
F1-P-ME: uh grounding
<un>XXX</un>, if there is some: (1) a (.) generator or (3) always use a safety: blanket in: making hard work. (4)
Unintelligible speech is represented by x’s approximating a syllable number and placed between <un></un>tags.
Table 10. VOICE spelling conventions utilised for this project (partially adapted)
Categories Example Description
1.
CHARACTERS
a b c d e f g h i j k l m n o p q r s t u v w x y z
Only alphabetic roman characters are used in the transcript. No diacritics, umlauts or non-roman characters are permitted in the running text. 2.
BRITISH SPELLING
British English spelling is used to represent naturally occurring ELF speech.
3. SPELLING EXCEPTIONS
center, theater, behavior, color, favor, labor, neighbor defense, offense, disk, program, travel (- l-: traveled, traveler, traveling)
The 12 words listed on the left and all their derivatives are spelled according to American English conventions (e.g. colors, colorful, colored, to color, favorite, favorable, to favor, in favor of, etc.).
71 FULL
REPRESENTATI- ONS OF WORDS
freely to enter (.) this kind of master knows (.) for example that he can (.) at the end achieve (.) sixty credits
pronounced or may be pronounced with a foreign accent, they are generally represented in standard orthographic form.
5.
CONTRACTIONS
i’m, there’re, how’s peter, running’s fun, …i’ve, they’ve, it’s got, we’d been, …tom’ll be there, he’d go for the first, …we aren’t, i won’t, he doesn’t, …what’s it mean, where’s she live, how’s that sound …let’s
All standard contractions are rendered whenever uttered. This refers to verb contractions with
be (am, I, are), have (have, has,
had), will and would as well as non-contractions.
6 DISCOURSE
MARKERS
All discourse markers are represented in orthography as shown below. The lists provided are closed lists. The items in the lists are standardised and may not represent the exact sound patterns of the actual discourse markers uttered.
yes, yeah, yah
Backchannels and positive minimal feedback (All lemmatised as yes in frequency and keyword list)
okay
mhm, hm
(closed sound-acknowledgement token)
(All lemmatised as mhm in frequency and keyword list)
aha, uhu (open sound-acknowledgement token)
no Negative minimal feedback
er, erm
Hesitation/filler
(All lemmatised as er in frequency and keyword list)
huh tag-question/ eliciting agreement
yay, yipee,
whoohoo, mm: Exclamations/joy/enthusiasm a:h, o:h, wow,
poah astonishment/surprise haeh questioning/doubt/disbelief oops apology ooph exhaustion ts click consonant ur disapproval/disgust
72 3.4.2. Data Analysis Software
The transcribed and annotated corpus data will be analysed using a newly updated concordance tool, Antconc (version 3.2.4w from http://www.antlab.sci.waseda.ac.jp). With Antconc 3.2.4.w, various linguistic features will be analysed by utilising the different corpus- based research methods discussed in 2.4.2.2 (i.e. frequent words, keywords, collocations and chunk) as detailed in following section.