Almut Koester
5. What can be learnt from a small specialised corpus?
Having covered the issues involved in designing and compiling a small specialised corpus, in this section we will discuss the advantages of small corpora in terms of what can be
Table 6.1 VOICE Transcription Conventions 2.1 from VOICE website
Code Example
Pauses:
Every brief pause in speech (up to a good half- second) is marked with a full stop in parentheses
SX-f: because they all give me different (.) different (.) points of view
Other continuation:
Whenever a speaker continues, completes or supports another speaker’s turn immediately (i.e. without a pause), this is marked by“=”
S1: what up till (.) till twelve? S2: yes=
S1: =really. so it’s it’s quite a lot of time. Emphasis:
If a speaker gives a syllable, word or phrase particular prominence, this is written in capital letters.
S3: toMORrow we have to work on the presentation already
Intonation:
Words spoken with rising intonation are followed by a question mark“?”
S1: that’s what my next er slide? does Words spoken with falling intonation are followed
by a full stop“.”
learnt from them. As already mentioned at the beginning of this chapter, one of the main advantages of a small specialised corpus is that, unlike with a large corpus, the language is not de-contextualised. On the contrary, there is a very close link between language and context; as O’Keeffe (2007) says, ‘the texts behind the numbers are more accessible’.
Flowerdew (2008) points out that there are in fact two ways in which the context is relevant for corpus analysis:
1) The context can inform the corpus-based analysis, for example when the compiler- cum-analyst of a small specialised corpus has access to background information to aid in the interpretation of the data.
2) The linguistic patterns identified through corpus analysis can tell us something about the social and cultural context from which the data were taken.
For both types of contextual links between corpus and context, small specialised corpora have a clear advantage over large corpora. The first type was discussed in Section 4 on corpus compilation; here we will examine the second more closely.
O’Keeffe (2007) specifies the process by which patterns identified in a small, specialised corpus can reveal insights into the context of use. The patterns canfirst of all be linked to a particular context, because the corpus analysis shows that they are concentrated within that context. We can see that these patterns are localised, as they are traceable to local situational conditions, such as gender, power or discourse goal. As a result, the patterns can be linked to pragmatically specialised uses within that particular context of situation. Thus the ‘specificity of representation narrows and concentrates coverage and brings into clear focus signature uses of language in given contexts of use’ (ibid.). An example of a ‘signature use of language’ is given by O’Keeffe et al. (2007: 182) from the CANBEC corpus of business interactions. The corpus has many examples of the pattern going forward, for example:
make sure that your forecast going forward is actually correct
The use of this phrase, rather than a synonym such as ‘in the future’, can be seen as marking in-group membership in a business community; and the phrase can therefore be seen as a kind of‘signature’ of that community of practice. Such localised uses would not necessarily show up in a larger corpus, where uses will be spread across a much greater range of contexts.
Let us now turn to some specific examples of how factors such as genre, topic or the relationship between the participants can influence local contexts of use. As we saw in Section 4, CANBEC was designed in such a way that information about the topic and purpose of the meetings, and the relationship between the speakers, is retrievable. Quantitativefindings, such as frequency counts, can therefore be linked to these different factors. The use of the lexical items issue and problem in CANBEC provides an interesting illustration of the role such factors can play in influencing local contexts. These words apparently are synonyms and, looking at the corpus as a whole, their use seems very similar: they both have a very high frequency and enter into similar collocational pat- terns. However, the frequency of these two lexical items varies considerably when one looks at the topics discussed in the meetings and the relationship between the speakers (Handford 2007). Issue, for example is more frequent in human resources and marketing meetings, whereas problem occurs most in procedural and technical meetings. In terms of
speaker relationship, issue occurs more in interactions between managers and sub- ordinates, whereas problem is used more in peer discussions. Handford gives the following example from a meeting between peers, in which both issue and problem are used, to illustrate how these two words actually perform slightly different functions:
Well I-I thi-think that’s another issue. And the other the and and another issue which comes on-onto that is that erm I’m still waiting he s-that cos (1.5 secs) Apparently one of the problems with getting some of the information off the computer is the fact that-erm that particular (1 sec) the s-the software is not as powerful as the stuff we’ve got on the the new computer that he’s got. (3 secs) There was an issue about getting the stuff off …
(Handford 2007: 252–3) Handford (ibid.: 253) notes that while both words have the‘prosody of difficulty’, problem seems to indicate more of a concrete obstacle, something that should be solved, whereas issue is somewhat more nebulous, and perhaps indicates that further discussion is needed. This fits with the nature of the meetings topics in which each of these words is more frequently used: in technical and procedural meetings, one would expect concrete pro- blems to be raised, whereas in human resources and marketing wider discussion‘around’ issues might be required. Furthermore, if we consider the interpersonal dimension of these words, problem comes across as more categorical, and its use could therefore poten- tially be face-threatening. This explains its higher frequency in peer meetings, where threats to face are less likely, thanks to the equal relationship between participants. In meetings between unequal participants (managers and subordinates), issue may be a useful euphemistic alternative to problem, serving to mitigate a potentially face-threatening act.
The use of these two apparent synonyms also varies with the nature of the activity or genre in CANBEC. Comparing their use in external meetings (between two different companies) and internal meetings, both words combined are more than twice as frequent in internal meetings than external meetings. This can be explained by the fact that internal meetings typically focus on decision-making, where issues and problems are dis- cussed and frequently resolved; whereas in external meetings decisions are often not made, but rather explained, contested or evaluated.
In the ABOT Corpus, a much smaller spoken corpus of workplace interactions (see Section 3), we can also observe the influence of local contexts on the frequency and use of various words and patterns. Both CANBEC and ABOT show that modals of obliga- tion (have to, need to, should) are very frequent in workplace interactions (Koester 2006; Handford forthcoming). However, in both corpora, these modals, as well as their collo- cational patterns, are differentially distributed according to local contexts, such as genre and speaker relationship. The genres in ABOT are grouped into two ‘macro-genres’: unidirectional and collaborative. In unidirectional genres one of the speakers clearly plays a dominant role, for example imparting information or giving instructions. In collabora- tive genres, such as decision-making and planning, participants contribute more or less equally towards accomplishing the goal of the encounter. In the ABOT Corpus, all the modals of obligation are more frequent in collaborative genres than in unidirectional genres, as shown in Table 6.2 (Koester 2006).
Table 6.2 also shows that the difference in frequency is greater the stronger the modal: i.e. have to which is the most forceful, occurs nearly twice as frequently, whereas should, the least forceful, is only marginally more frequent in collaborative genres.
Moreover, collocational patterns of modals and pronoun combinations also vary system- atically with genre. Thus in collaborative genres, we and you are the most frequent pronouns used with the above modals, whereas in unidirectional genres, I occurs most frequently in combination with all three modals. In unidirectional genres, you have to does not occur at all: there is just one example of you’ll have to and a few instances of you don’t have to.
Both the lower frequency of the more forceful modals and the infrequent use of the pronoun you in combination with all three modals of obligation can be linked to the feature that all unidirectional genres have in common, namely the fact that one speaker plays a dominant role. Regardless of the actual social or institutional relationship between the speak- ers, this imbalance in the speakers’ roles means more care is taken to avoid face-threatening acts. This results in more indirect and hedged language, as illustrated in the following example, where a speaker makes a request using I need you to instead of you need to:
I need you to sign off on this pack too.
(Author’s data) Another reason for the frequency of thefirst person pronoun I is that in procedural dis- course/directive or instruction-giving (the most frequent unidirectional genre), the person receiving instructions frequently ‘invites’ directives by saying should I, e.g.:
What should I do. Just – get the estimate …
(Author’s data) In collaborative genres, on the other hand, participants play a more equal role, and therefore more direct forms, such as you have to or you should are unproblematic, e.g.:
You have to make sure you can get access to that. You need to update this too.
(Author’s data) Also, most collaborative genres are action-orientated, i.e. people are trying to get things done (decisions, plans, arrangements), which results in the frequent use of modals of obligation with the first person pronoun we, e.g.:
Right. We’ll have to go through it. We need to get it moving.
(Author’s data) Collocations can also take on specific pragmatic meanings or ‘semantic prosodies’ (see Flowerdew, this volume) within a specialised genre, and this is something corpus analysis can reveal. Flowerdew (2008) found that the collocation associated with was very frequent
Table 6.2 Total number of occurrences of modals of obligation in each macro-genre
Collaborative Unidirectional
have to 46 26
need (to) 32 22
in a corpus of environmental reports. Not only did it occur 139 times in the 250,000- word corpus, but it was found across all twenty-three companies from which the reports were drawn, indicating that this is a phrase that is typical for the genre, and not a result of‘local prosody’ (see Section 3). In 135 of these instances, the phrase seemed to have a negative semantic prosody, for example:
difficulties associated with hydraulic dredging
Health hazards associated with proximity to high tension power lines.
(Flowerdew 2008: 121) Flowerdew (2008: 121) concludes that this phrase is‘most likely an attenuated form of “caused by”’ which is used by scientists to ‘avoid claiming a direct causal effect, thereby forestalling any challenges from their peers’, and therefore forms part of the discourse practices of the genre of environmental reports. In order to determine whether this finding is generalisable to other types of scientific writing, Flowerdew searched for the phrase associated with in the much larger seven-million-word Applied Science domain of the British National Corpus, and found that in 40 per cent of the samples examined the phrase also has a negative semantic prosody. Such comparisons with a larger corpus covering a similar variety or genre as the smaller specialised corpus are very useful in testing the validity of findings from such corpora, and reinforcing the robustness of any generalisations made (see also Flowerdew 2003). By comparing a small corpus against a larger ‘benchmark’ corpus, ‘keywords’ can also be identified (e.g. using WordSmith Tools; Scott 1999): these are words that are unusually frequent in the small corpus compared to their normal frequency in the language (see Evison, this volume).
This chapter has shown that while small corpora are not suitable for all types of ana- lysis, a small specialised corpus can nevertheless provide valuable insights into specific areas of language use, and can even have certain advantages over large corpora. The main advantage is in the close link that exists between language patterns and contexts of use, as illustrated throughout this chapter from corpus design, through compilation and tran- scription to corpus analysis andfindings. This interplay of language and context in corpus studies can be followed up in other chapters in this volume which deal with special areas of language use. Chapter 19 looks in more detail at what a corpus can tell us about specialist genres, and other chapters focus on specific genres, for example Coxhead examines English for Academic Purposes, Cotterill looks at forensic linguistics, and Atkins and Harvey explore health communication.