Concordancing - Corpora and concordancing

CHAPTER II. REVIEW OF THE LITERATURE

2.2. Corpora and concordancing

2.2.2. Concordancing

A concordance is a list of all words found in a specific text or set of texts. It shows the larger context where each word is found. Concordancing is a way to access a corpus of texts in order to show how a word or expression was used in the given context (Flowerdew, 1996). Computer-based concordancing programs perform this analysis automatically. Such programs as Longman Mini-Concordancer (Chandler, 1989),

MicroConcord (Scott & Johns, 1993), WordSmith (Scott, 2000b) and MonoConc Pro

(Barlow, 2002) provide an option for users to retrieve contexts in which the search words or phrases occur. These contexts can be displayed in various ways. The advantages of using computers for corpus linguistic investigations include automatic searching, sorting and scoring. For example,MonoConc Pro 2.2 (Barlow, 2002) looks for the search word

in the corpus and displays the results as it is performing the search. It is also capable of counting word and collocation frequencies.

15 Figure 2.3. Concordancing program: MonoConc Pro 2.2 (Barlow, 2002) and English

Gigaword (Graff, 2003)

Figure 2.4. Concordancing program search results: MonoConc Pro 2.2 (Barlow, 2002) and English Gigaword (Graff, 2003)

16 Figure 2.5. Concordancing program collocation frequency: MonoConc Pro 2.2 (Barlow,

2002) and English Gigaword (Graff, 2003)

In order for these procedures to be carried out, a corpus has to be in plain text format. In other words, a corpus has to be readable by a concordancing program, which will find each occurrence of any requested word and will also place it in context. The amount of context varies according to the settings the user chooses or the software that is used. Usually, only one line of text is provided, which may or may not be a complete sentence (see Figure 2.3). Concordancing programs can search not only for a single word, but also for all occurrences of a word with a specific stem. This can be done using special characters such as the asterisk (*) for partial words or the ‘at’ symbol (@) for any number of words occurring between two search words. For example, a search for the word itself only produces results with “work” only. Searching with "work*" will produce results with "worker," "working," and "worked," etc. in context (see Figure 2.4). Users can also type in groups of words or phrases on the computer screen. The concordancing program then displays the most typical patterns of the given words or phrases. A

concordancing program is able to provide additional information such as word frequency lists, collocation frequency, an alphabetical list of all words in a corpus, and the number

17 of times a word occurs (Murphy, 1996; Wichmann, 1995). Figure 2.5 shows the

collocation frequency list of “work*.” It indicates that “to” occurs 26 times right before the search word (“work*”), and it is the most frequently occurring word in this position. The figure also indicates that “in” occurs 34 times right after the search word, and it is the most frequent word in this position.

Some concordancing programs such as ParaConc (Barlow, 2001) can display search results in multiple languages (see Figure 2.6). One purpose of such programs is to permit investigations of translated texts. ParaConc (Barlow, 2001) allows loading of any language pairs. For example, this program can be used for English-Chinese or French- Italian texts. It accepts as many as four different languages. This means that the program can display search results in four different languages. When a user clicks on a line in the results window, the corresponding segment in different languages will be highlighted. The following figure shows the search results of “head” from an English corpus and corresponding text segments from a French corpus in the lower window.

Before going into detail about the use of concordancing programs, it is necessary to explain how a corpus can be utilized by a concordancing program. The Key Word In Context (KWIC) format is the most popular form of output. For example, a search word is shown in the middle of the line, surrounded by authentic contexts. The scope of the context words can be specified by the user (see Figure 2.7). Usually just a single line of text is provided, which may be a sentence fragment. The KWIC format enhances understanding of the key word by providing reliable and realistic contexts in which the word is used (Murphy, 1996). In some concordancing programs, a specific word order can be used as a search term.

18 Figure 2.6. ParaConc (Search word “head”) (Barlow, 2001)

Figure 2.7. Concordancing program search results in KWIC format: MonoConc Pro 2.2 (Barlow, 2002) and English Gigaword (Graff, 2003)

19 Kennedy and Miceli (2001) asked students to use a corpus while they revised

their own written work. The class was presented with anonymous sample sentences from the previous week's writing at regular intervals. They worked with this assignment in order to practice ways of using the corpus for correcting their writing. The goal of this research was to give students a corpus to use as a primary reference tool while writing. A corpus appropriate for their proficiency level and tasks was given in order to provide examples of personal writing on general topics.

Dodd (1997) studied the use of a corpus in a language classroom. The students were given a new raw corpus, which they used to compare with reference works. Dodd's conclusion was that a computer-supported investigation is a powerful but simple tool for language learning. This supports Leech’s opinion (1997) that a computer promotes a learner-centered approach with an open-ended supply of language data that encourages discovery learning.

In addition to serving pedagogical purposes, concordancing programs have played an important role for linguistic and literary researchers (St. John, 2001). For example, Mintz, Newport, and Bever (2002) investigated linguistic input directed at young children under two and a half years old. They wanted to find out whether this input contains adequate information for the acquisition of grammatical categories of noun and verb. Input corpora selected for the study was from the CHILDES database (Mac Whinney, 2000). The corpora selected from this database contain a significant number of utterances commonly directed at children under two and a half years of age. Two hundred of the most frequent words were used. Low-frequency words were not used because they include very few contexts applicable to this particular age group. The results demonstrated that the information in the input could help in constructing

grammatical categories of nouns and verbs. Additionally, they discovered what type of information must be available for learners to categorize nouns and verbs. This study supports nativist claims for the existence of innate lexical categories, which, according to

20 the theory, all humans possess, which makes it possible to categorize lexicon at even a

very early age.

Concordancing has various applications. Lexicography and dictionary making were the first applications of concordancing (Flowerdew, 1996). The Collins Cobuild

Dictionary was a result of authentic concordancing examples. This corpus has also been

utilized for literacy and linguistic research as well as stylistics. As stated previously, these applications could potentially contain millions of words. These tools challenge fundamental linguistic descriptions because they allow for explanations based on

evidence instead of intuition (Sinclair, 1986). Observation of language data seems to be the most reliable source of evidence for certain types of language phenomena, such as frequency, while Chomsky may suggest that this type of quantitative data is meaningless (McEnery & Wilson, 2001).

In addition to the aforementioned benefits, concordancing is extremely efficient and has great potential for innovation. Concordancing was characterized by Stevens (1990) as "economical in terms of time" to carry out text manipulation because it only requires a program and a collection of texts. For example, concordancing can be used as help for computerized cloze exercises so that students can find more about the nature of the word in the gap (Stevens, 1995).

In document Effects of using corpora and online reference tools on foreign language writing: a study of Korean learners of English as a second language (Page 32-38)