CHAPTER 2 Definitions and methodology
2.5 Methodology
2.5.1 Corpus linguistics
It has been already mentioned that linguistic corpora will be used as referential corpora in this study. A brief account of what kinds of corpora exist, what purposes different kinds of
corpora may be used for and, most importantly, what linguistic corpora may be useful for the present study, are the topics of this section.
The language research method built upon corpus linguistics allows access to huge amounts of language data. As stated in Johansson (2003: 1), the main advantage of working with text collections found in books, articles, the Internet, etc. is that the researcher deals with natural, or authentic, language usage, within a certain context. Thus, various investigations within linguistics can be conducted. The modern method of language research suggests use of
27
particularly prepared and “made available in computer-readable form for the purpose of linguistic analysis” (Meyer 2002 preface).
To use the data in the most rational way it is important to define specific questions, i.e. what we want to find out from the material. Johansson’s advice is to “start with one question [...], continue with new questions that spring from the analyses of the material” (2003: 3). Several research questions have been presented in section 2.3. The question “What are the most striking stylistic differences between Norwegian and English legal language” is a starting point for the contrastive analysis in Chapter 4. With the progress of the analysis, more refined research questions will be adjusted accordingly.
Johansson describes work with a corpus as “a kind of a dialog between the researcher and a corpus” (2003: 3). A researcher starts with one question, then examines the relevant corpus material, then continues with new questions that result from the analysis of the corpus material; as soon as new questions arise, a researcher goes on working on the examination of the corpus material again, and so forth. The moving back and forth from posing questions to examining them and, again, to posing new questions, is part of the method of corpus
linguistics. The procedure may be imagined as a spiral circle made of many layers. Each layer represents corpus data and their analysis. Every new layer is built upon the preceding one. Therefore, the further a layer lies from the centre of the spiral circle, the more information and evidence it contains. Importantly, doing corpus research, one should not expect to obtain interesting results immediately. It may be necessary to conduct many searches, constantly evaluating the result. Thus, introspection is involved even if the method chosen is that of corpus linguistics.
Today, the field of corpus linguistics makes use of a great variety of corpora of different languages, different sizes and different potential; there are monolingual, bilingual and multilingual corpora. The first generation computer-readable corpora, such as the Brown corpus of American English and the LOB corpus of British English, were compiled in the 1960s and consisted of 1 million words each. Modern corpora are much larger and many of them contain both written and spoken parts. Monolingual corpora can be used for studies of language varieties (such as regional dialects), comparative studies of genre, or simply examination of the genuine usage of various lexical items. Some corpora offer just a few functions, while others have a very great potential, allowing analysis not only on the lexical
28
level, but also on the levels of grammar and syntax. Besides, there are corpora that provide special functions that allow studying phonology.
The first stage in the analysis presented in Chapter 4, will be the examination of Norwegian and English legal texts in order to find lexical items whose character seems particularly legal. Lists of legal words from several legal texts will be made and systematized in order to obtain an overview of particular patterns, distributions and the like. For this purpose, two
monolingual corpora, English and Norwegian, will be used. For English texts, the British National Corpus (BNC) will be used. For Norwegian texts, there are two corpora which may reveal patterns and interesting features in legal texts: the Oslo Corpus of Tagged Norwegian Texts and the Lexicographic Corpus for Norwegian Bokmål (LBK). More detailed
information about these corpora will be given in the following sections. Since the main focus of the analysis is a contrastive analysis, the comparison of Norwegian and English will be central. Therefore, there is a need for a bilingual or a multilingual corpus. A corpus that contains both Norwegian and English texts of various genres is the English-Norwegian Parallel Corpus (ENPC).
The three following sections will give a brief introduction to the mentioned corpora. In the following, the relevant sub-corpora will be identified and the way in which these sub-corpora are intended to be applied in the analysis in Chapter 4 will be described.
2.5.2 Linguistic Corpora for Norwegian (The Oslo Corpus of Tagged Norwegian Texts and