Data Processing for Corpus Analysis

CHAPTER 3 RESEARCH METHODS

3.4 Data Processing for Corpus Analysis

To analyse the information collected through corpus-based approaches, all of the spoken data was transcribed with two major aims in mind: analysing the lexico-grammatical features in depth, and understanding the various pragmatic features within them. For this, verbatim transcription, which details all of the speakers’ utterances including non-meaning conveyance words such as ems, ers and you knows, was conducted first, as a point of departure. The data was then transcribed in more detail, to analyse a wide range of pragmatic features in depth according to VOICE (Vienna Oxford International Corpus of English) transcription and spelling conventions designed for transcribing different English varieties in ELF communication.

3.4.1 Data Transcription (VOICE Transcription Scheme & Spelling Conventions)

The VOICE transcription scheme was designed with three important points of emphasis in mind. These state that the transcription should: ‘capture the reality of spoken interactions as precisely as possible’; ‘be replicable...by other researchers’, and be ‘computer-readable’ (VOICE project, 2007). This detailed and precise transcription scheme seems to make in-

depth understanding of pragmatic features possible by allowing researchers to investigate a particular set of linguistic features beyond words, phrases and clauses, and explore their distinctive communicative functions. In addition, the VOICE spelling convention, which is designed to analyse ‘the diversity of ELF speech in a standardized way’ (VOICE project, 2007), has been utilised and partially adapted for this research. Two schemes are summarised in Tables 9 and 10.

Table 9. VOICE transcription scheme utilised for this project (partially adapted)

Categories Example Description

1. SPEAKER IDS

Interviewers I1:

I2:

I for Interviewer

N. for assigned Interviewers’ N. (e.g. 1,2,3..) Interviewees F1-P-ME: F2-S-WA: P1-I-ME: P2-I-WA: F (Fail)/ P (Pass) 1 (Assigned N.) e.g. 1,2,3… P (Name of country) e.g. P for the Philippines

M (Man)/ W(Woman)

A (Engineering)/A (Administration)

2. INTONATION

I1: that’s what my next er slide? does

Words spoken with rising intonation are followed by a question mark “?” I1: that’s point two. Absolutely yes. Words spoken with falling intonation

are followed by a full stop “.”

3. PAUSES

P20-V-ME : I'm:- I feel confident- (.) I (.) that's I can contribu:te in the your company (.)

Every brief pause in speech (up to a good half second) is marked with a full stop in parentheses.

P20-V-ME : Of course, (.) I already know before. (2)

Longer pauses are timed to the nearest second and marked with the number of seconds in parentheses, e.g. (1) = 1 second, (3) = 3 seconds.

4. OVERLAPS P20-V-ME: Uh: I: am <1>for</1>

I1: <1>How</1> many years?

Whenever two or more utterances happen at the same time, the overlaps are marked with numbered tags: <1></1>, <2></2>,…Everything that is simultaneous gets the same

I1: it is (.) to identify some<1>thing </1>where (.)

F1-P-ME: <1> mhm </1>

All overlaps are approximate and words may be split up if appropriate. In this case, the tag is placed within the split word.

5. LATCHING

F1-P-ME: yes

I1: <=>really. so it’s it’s quite a lot of time.

Whenever a speaker continues, completes or supports another speaker’s turn immediately

(i.e.without a pause), this is marked by “<=>”.

6. LENGTHENING

I1: <=>Even though: you- your demand is, for example, I need only (.) one hundred [currency1]

Lengthened sounds are marked with a colon “:”.

P20-V-ME: I have assistant member (.) uh: about two (thousand) from: in the mother:: [org2]. (.) It mean [org3] dockyard.

Exceptionally long sounds (i.e. approximating 2 seconds or more) are marked with a double colon “::”.

7. REPETITION I1: e:r i’d like to go t- t- to to this

type of course

All repetitions of words and phrases (including self-interruptions and false starts) are transcribed.

8. WORD FRAGMENTS

I1: <=>And please be in con- (.) confidence. hm? (3)

With word fragments, a hyphen marks where apart of the word is missing.

9. LAUGHTER

F2-P-ME: I just want to know (.) what is the name of your company, <@>sir.</@>

I1: <@>@@@@@</@> You can ask to your agent.

All laughter and laughter-like sounds are transcribed with the @ (i.e. ‘ha’, open laughter) or * (i.e. ‘hm’, throaty laughter) symbol, approximating syllable number (e.g. ha ha ha = @@@).

Utterances spoken laughingly are put between<@></@>tags.

10. UNCERTAIN TRANSCRIPTION

F1-P-ME: I will comply with the (safety) policy (.) On morning, (.)

Word fragments, words or phrases which cannot be reliably identified are put in parentheses ( ).

11.

ANONYMISATION

[P13] [F2/last]

Whenever speakers who are involved in the interaction are addressed or referred to, their names are replaced by their respective speaker IDs. A speaker’s first name is represented by the plain speaker ID in square

brackets [P1], etc. A speaker’s last name is marked [P1/last], etc. If a speaker’s full name is pronounced, the two tags are combined to [F1] [F1/last], etc.

70 [first

name3] [last name3]

Names of people who are not part of the ongoing interaction are

substituted by [firstname1], etc. or [last name1], etc. or a combination of both.

[org2]

Companies and other organisations need to be anonymised as well. Their names are replaced by [org1], etc.

[place12]

Names of places, cities, countries, etc. are anonymised when this is deemed relevant in order to protect the speakers’ identities and their environment. They are replaced by [place1], etc.

[name1]

Other names or descriptors may be anonymised by [name1], etc., as in e.g. Charles University.

12.

UNINTELLIGIBLE SPEECH

F1-P-ME: uh grounding

<un>XXX</un>, if there is some: (1) a (.) generator or (3) always use a safety: blanket in: making hard work. (4)

Unintelligible speech is represented by x’s approximating a syllable number and placed between <un></un>tags.

Table 10. VOICE spelling conventions utilised for this project (partially adapted)

Categories Example Description

CHARACTERS

a b c d e f g h i j k l m n o p q r s t u v w x y z

Only alphabetic roman characters are used in the transcript. No diacritics, umlauts or non-roman characters are permitted in the running text. 2.

BRITISH SPELLING

British English spelling is used to represent naturally occurring ELF speech.

3. SPELLING EXCEPTIONS

center, theater, behavior, color, favor, labor, neighbor defense, offense, disk, program, travel (- l-: traveled, traveler, traveling)

The 12 words listed on the left and all their derivatives are spelled according to American English conventions (e.g. colors, colorful, colored, to color, favorite, favorable, to favor, in favor of, etc.).

71 FULL

REPRESENTATI- ONS OF WORDS

freely to enter (.) this kind of master knows (.) for example that he can (.) at the end achieve (.) sixty credits

pronounced or may be pronounced with a foreign accent, they are generally represented in standard orthographic form.

CONTRACTIONS

i’m, there’re, how’s peter, running’s fun, …i’ve, they’ve, it’s got, we’d been, …tom’ll be there, he’d go for the first, …we aren’t, i won’t, he doesn’t, …what’s it mean, where’s she live, how’s that sound …let’s

All standard contractions are rendered whenever uttered. This refers to verb contractions with

be (am, I, are), have (have, has,

had), will and would as well as non-contractions.

6 DISCOURSE

MARKERS

All discourse markers are represented in orthography as shown below. The lists provided are closed lists. The items in the lists are standardised and may not represent the exact sound patterns of the actual discourse markers uttered.

yes, yeah, yah

Backchannels and positive minimal feedback (All lemmatised as yes in frequency and keyword list)

okay

mhm, hm

(closed sound-acknowledgement token)

(All lemmatised as mhm in frequency and keyword list)

aha, uhu (open sound-acknowledgement token)

no Negative minimal feedback

er, erm

Hesitation/filler

(All lemmatised as er in frequency and keyword list)

huh tag-question/ eliciting agreement

yay, yipee,

whoohoo, mm: Exclamations/joy/enthusiasm a:h, o:h, wow,

poah astonishment/surprise haeh questioning/doubt/disbelief oops apology ooph exhaustion ts click consonant ur disapproval/disgust

72 3.4.2. Data Analysis Software

The transcribed and annotated corpus data will be analysed using a newly updated concordance tool, Antconc (version 3.2.4w from http://www.antlab.sci.waseda.ac.jp). With Antconc 3.2.4.w, various linguistic features will be analysed by utilising the different corpus- based research methods discussed in 2.4.2.2 (i.e. frequent words, keywords, collocations and chunk) as detailed in following section.

In document Cross-cultural job interview communication in business English as a lingua franca (BELF) contexts: a corpus-based comparative study of multicultural job interview communications in world maritime industry (Page 86-91)