What can be learnt from a small specialised corpus?

Almut Koester

5. What can be learnt from a small specialised corpus?

Having covered the issues involved in designing and compiling a small specialised corpus, in this section we will discuss the advantages of small corpora in terms of what can be

Table 6.1 VOICE Transcription Conventions 2.1 from VOICE website

Code Example

Pauses:

Every brief pause in speech (up to a good half- second) is marked with a full stop in parentheses

SX-f: because they all give me diﬀerent (.) diﬀerent (.) points of view

Other continuation:

Whenever a speaker continues, completes or supports another speaker’s turn immediately (i.e. without a pause), this is marked by“=”

S1: what up till (.) till twelve? S2: yes=

S1: =really. so it’s it’s quite a lot of time. Emphasis:

If a speaker gives a syllable, word or phrase particular prominence, this is written in capital letters.

S3: toMORrow we have to work on the presentation already

Intonation:

Words spoken with rising intonation are followed by a question mark“?”

S1: that’s what my next er slide? does Words spoken with falling intonation are followed

by a full stop“.”

learnt from them. As already mentioned at the beginning of this chapter, one of the main advantages of a small specialised corpus is that, unlike with a large corpus, the language is not de-contextualised. On the contrary, there is a very close link between language and context; as O’Keeﬀe (2007) says, ‘the texts behind the numbers are more accessible’.

Flowerdew (2008) points out that there are in fact two ways in which the context is relevant for corpus analysis:

1) The context can inform the corpus-based analysis, for example when the compiler- cum-analyst of a small specialised corpus has access to background information to aid in the interpretation of the data.

2) The linguistic patterns identiﬁed through corpus analysis can tell us something about the social and cultural context from which the data were taken.

For both types of contextual links between corpus and context, small specialised corpora have a clear advantage over large corpora. The ﬁrst type was discussed in Section 4 on corpus compilation; here we will examine the second more closely.

O’Keeffe (2007) specifies the process by which patterns identified in a small, specialised corpus can reveal insights into the context of use. The patterns canfirst of all be linked to a particular context, because the corpus analysis shows that they are concentrated within that context. We can see that these patterns are localised, as they are traceable to local situational conditions, such as gender, power or discourse goal. As a result, the patterns can be linked to pragmatically specialised uses within that particular context of situation. Thus the ‘specificity of representation narrows and concentrates coverage and brings into clear focus signature uses of language in given contexts of use’ (ibid.). An example of a ‘signature use of language’ is given by O’Keeffe et al. (2007: 182) from the CANBEC corpus of business interactions. The corpus has many examples of the pattern going forward, for example:

make sure that your forecast going forward is actually correct

The use of this phrase, rather than a synonym such as ‘in the future’, can be seen as marking in-group membership in a business community; and the phrase can therefore be seen as a kind of‘signature’ of that community of practice. Such localised uses would not necessarily show up in a larger corpus, where uses will be spread across a much greater range of contexts.

Let us now turn to some specific examples of how factors such as genre, topic or the relationship between the participants can influence local contexts of use. As we saw in Section 4, CANBEC was designed in such a way that information about the topic and purpose of the meetings, and the relationship between the speakers, is retrievable. Quantitativefindings, such as frequency counts, can therefore be linked to these different factors. The use of the lexical items issue and problem in CANBEC provides an interesting illustration of the role such factors can play in influencing local contexts. These words apparently are synonyms and, looking at the corpus as a whole, their use seems very similar: they both have a very high frequency and enter into similar collocational patterns. However, the frequency of these two lexical items varies considerably when one looks at the topics discussed in the meetings and the relationship between the speakers (Handford 2007). Issue, for example is more frequent in human resources and marketing meetings, whereas problem occurs most in procedural and technical meetings. In terms of

speaker relationship, issue occurs more in interactions between managers and subordinates, whereas problem is used more in peer discussions. Handford gives the following example from a meeting between peers, in which both issue and problem are used, to illustrate how these two words actually perform slightly diﬀerent functions:

Well I-I thi-think that’s another issue. And the other the and and another issue which comes on-onto that is that erm I’m still waiting he s-that cos (1.5 secs) Apparently one of the problems with getting some of the information off the computer is the fact that-erm that particular (1 sec) the s-the software is not as powerful as the stuff we’ve got on the the new computer that he’s got. (3 secs) There was an issue about getting the stuff off …

(Handford 2007: 252–3) Handford (ibid.: 253) notes that while both words have the‘prosody of diﬃculty’, problem seems to indicate more of a concrete obstacle, something that should be solved, whereas issue is somewhat more nebulous, and perhaps indicates that further discussion is needed. This ﬁts with the nature of the meetings topics in which each of these words is more frequently used: in technical and procedural meetings, one would expect concrete problems to be raised, whereas in human resources and marketing wider discussion‘around’ issues might be required. Furthermore, if we consider the interpersonal dimension of these words, problem comes across as more categorical, and its use could therefore potentially be face-threatening. This explains its higher frequency in peer meetings, where threats to face are less likely, thanks to the equal relationship between participants. In meetings between unequal participants (managers and subordinates), issue may be a useful euphemistic alternative to problem, serving to mitigate a potentially face-threatening act.

The use of these two apparent synonyms also varies with the nature of the activity or genre in CANBEC. Comparing their use in external meetings (between two diﬀerent companies) and internal meetings, both words combined are more than twice as frequent in internal meetings than external meetings. This can be explained by the fact that internal meetings typically focus on decision-making, where issues and problems are discussed and frequently resolved; whereas in external meetings decisions are often not made, but rather explained, contested or evaluated.

In the ABOT Corpus, a much smaller spoken corpus of workplace interactions (see Section 3), we can also observe the inﬂuence of local contexts on the frequency and use of various words and patterns. Both CANBEC and ABOT show that modals of obligation (have to, need to, should) are very frequent in workplace interactions (Koester 2006; Handford forthcoming). However, in both corpora, these modals, as well as their collocational patterns, are diﬀerentially distributed according to local contexts, such as genre and speaker relationship. The genres in ABOT are grouped into two ‘macro-genres’: unidirectional and collaborative. In unidirectional genres one of the speakers clearly plays a dominant role, for example imparting information or giving instructions. In collaborative genres, such as decision-making and planning, participants contribute more or less equally towards accomplishing the goal of the encounter. In the ABOT Corpus, all the modals of obligation are more frequent in collaborative genres than in unidirectional genres, as shown in Table 6.2 (Koester 2006).

Table 6.2 also shows that the diﬀerence in frequency is greater the stronger the modal: i.e. have to which is the most forceful, occurs nearly twice as frequently, whereas should, the least forceful, is only marginally more frequent in collaborative genres.

Moreover, collocational patterns of modals and pronoun combinations also vary system- atically with genre. Thus in collaborative genres, we and you are the most frequent pronouns used with the above modals, whereas in unidirectional genres, I occurs most frequently in combination with all three modals. In unidirectional genres, you have to does not occur at all: there is just one example of you’ll have to and a few instances of you don’t have to.

Both the lower frequency of the more forceful modals and the infrequent use of the pronoun you in combination with all three modals of obligation can be linked to the feature that all unidirectional genres have in common, namely the fact that one speaker plays a dominant role. Regardless of the actual social or institutional relationship between the speakers, this imbalance in the speakers’ roles means more care is taken to avoid face-threatening acts. This results in more indirect and hedged language, as illustrated in the following example, where a speaker makes a request using I need you to instead of you need to:

I need you to sign oﬀ on this pack too.

(Author’s data) Another reason for the frequency of theﬁrst person pronoun I is that in procedural discourse/directive or instruction-giving (the most frequent unidirectional genre), the person receiving instructions frequently ‘invites’ directives by saying should I, e.g.:

What should I do. Just – get the estimate …

(Author’s data) In collaborative genres, on the other hand, participants play a more equal role, and therefore more direct forms, such as you have to or you should are unproblematic, e.g.:

You have to make sure you can get access to that. You need to update this too.

(Author’s data) Also, most collaborative genres are action-orientated, i.e. people are trying to get things done (decisions, plans, arrangements), which results in the frequent use of modals of obligation with the ﬁrst person pronoun we, e.g.:

Right. We’ll have to go through it. We need to get it moving.

(Author’s data) Collocations can also take on speciﬁc pragmatic meanings or ‘semantic prosodies’ (see Flowerdew, this volume) within a specialised genre, and this is something corpus analysis can reveal. Flowerdew (2008) found that the collocation associated with was very frequent

Table 6.2 Total number of occurrences of modals of obligation in each macro-genre

Collaborative Unidirectional

have to 46 26

need (to) 32 22

in a corpus of environmental reports. Not only did it occur 139 times in the 250,000- word corpus, but it was found across all twenty-three companies from which the reports were drawn, indicating that this is a phrase that is typical for the genre, and not a result of‘local prosody’ (see Section 3). In 135 of these instances, the phrase seemed to have a negative semantic prosody, for example:

diﬃculties associated with hydraulic dredging

Health hazards associated with proximity to high tension power lines.

(Flowerdew 2008: 121) Flowerdew (2008: 121) concludes that this phrase is‘most likely an attenuated form of “caused by”’ which is used by scientists to ‘avoid claiming a direct causal effect, thereby forestalling any challenges from their peers’, and therefore forms part of the discourse practices of the genre of environmental reports. In order to determine whether this finding is generalisable to other types of scientific writing, Flowerdew searched for the phrase associated with in the much larger seven-million-word Applied Science domain of the British National Corpus, and found that in 40 per cent of the samples examined the phrase also has a negative semantic prosody. Such comparisons with a larger corpus covering a similar variety or genre as the smaller specialised corpus are very useful in testing the validity of findings from such corpora, and reinforcing the robustness of any generalisations made (see also Flowerdew 2003). By comparing a small corpus against a larger ‘benchmark’ corpus, ‘keywords’ can also be identified (e.g. using WordSmith Tools; Scott 1999): these are words that are unusually frequent in the small corpus compared to their normal frequency in the language (see Evison, this volume).

This chapter has shown that while small corpora are not suitable for all types of analysis, a small specialised corpus can nevertheless provide valuable insights into specific areas of language use, and can even have certain advantages over large corpora. The main advantage is in the close link that exists between language patterns and contexts of use, as illustrated throughout this chapter from corpus design, through compilation and transcription to corpus analysis andfindings. This interplay of language and context in corpus studies can be followed up in other chapters in this volume which deal with special areas of language use. Chapter 19 looks in more detail at what a corpus can tell us about specialist genres, and other chapters focus on specific genres, for example Coxhead examines English for Academic Purposes, Cotterill looks at forensic linguistics, and Atkins and Harvey explore health communication.

In document Routledge Handbook Cl (Page 102-106)