Development of learner language studies

3.2 Learner language

3.2.3 Development of learner language studies

The characteristics of learner language can be captured by various approaches. The

most noticeable feature is their errors, which are often investigated through error

analysis (James, 1998). Error analysis has perhaps had its heyday, but it also received

severe criticisms. With the blossoming of corpus linguistics, error analysis may find a

new way to grow, but the traditional method of analysing learners’ errors has become

insufficient. Another dominant classical approach is contrastive analysis, which

compares and contrasts at least two languages. The idea of contrastive analysis has

been applied by Granger to LL studies, and termed ‘Contrastive Interlanguage Analysis’

(CIA) (Granger, 1996:43), which makes a comparison between the original and target

languages and comparison between their translation equivalents. Later, Granger

modifies the CIA model into a comparison of native language with interlanguage or

different languages, i.e. (1) NL vs. IL and (2) IL vs. IL (Granger, 1998a:12)⁠. For the

former (NL vs. IL), overuses and underuses are the primary means to determine the

interlanguage differences. For the latter (IL vs. IL), different varieties of learner

Furthermore, Granger suggests integrating the CA and CIA to form a comprehensive

account to increase the validity of learner language research.

CIA has been adopted widely in learner corpora studies, based on a ‘Computer

Learner Corpus’ (CLC) approach, to use Granger’s term (Granger, 1998a:6). She gives

a summary of the basic features, and current analysis approaches of CLC (Granger,

2004). In her opinion, CLC distinguishes itself from other data collection types in SLA;

it has advantages in size, variability, and automation (Granger, 2004:124ff). She also

contends that the methodological framework at the heart of CLC rests mainly on CIA

(Contrastive Interlanguage Analysis) and CEA (Computer-aided Error Analysis). CLC

research can be classified according to its research design. Granger (1998a:15) notes

that CLC research can be classified into ‘hypothesis-based’ and ‘hypothesis-finding’.

She concludes that hypothesis-finding is more powerful to “gain totally new insights

into learner language” (Granger, 1998a:16). One of the study types which can benefit

from the ‘hypothesis-finding’ design is research on formulaic sequences. Granger and

other scholars have tested whether learner languages are composed of ‘individual

bricks’ or ‘prefabricated sections’ (De Cock et al., 1998:67), and investigated vague

language that occurred as some phraseological combinations (De Cock et al.,

with different frequencies and functions. In a more recent article (Granger, 2005b),

Granger pinpoints two mainstreams of research on phraseology. The first trend is the

interest in distinctions between less fixed multi-word units and free combinations. The

second trend concentrates on typical features, such as non-compositionality and

fixedness of the formulaic sequences. In order to complement the lack of a broader

overview on the phraseological phenomena, she ends up with a suggestion of

incorporating the statistical approach with fine-grained linguistic analysis as filters to

yield targets worthy of further investigation. This thesis thus follows this suggestion.

3.2.3.1 Corpus approaches to describing LL

With the advance of technology, corpora have been applied to inform the theories and

practice of second language acquisition. As such, in recent academic history, CLC

studies have been fruitful in describing learner language. Here I will review a few

significant learner corpora and a number of CLC studies on different aspects of learner

language.

Pravec (2002) surveyed the background information of several learner corpora.

Because of the space limitations, I will focus only on corpora of written texts. To name

(ICLE), the Cambridge Learner Corpus (CLC) and the Longman Learners’ Corpus

(LLC), will be described below.

The ICLE is a project created by the Centre for English Corpus Linguistics (CECL),

at Université Catholique de Louvain. The leading researchers include Sylviane Granger,

Fanny Meunier, Estelle Dagneaux, Magali Paquot and Sylvie De Cock. It is an

international project which collaborates with other researchers in many countries, such

as China, Germany, etc., containing over fourteen varieties of learner languages. A

comparable reference corpus, LOCNESS, containing both British and American

English, was built as well. All the corpora of ICLE were compiled in the same format

and designed using the same rules, in order to ensure comparability. Many studies have

been conducted examining the data from ICLE. For instance, the directing researchers

mentioned above have produced papers on many aspects of learner language (De Cock,

2000, 2001; Granger, 2005a; Meunier & Granger, 2008). Also, other researchers such

as Kaszubski (2000) have tackled the phraseological issues found in a sub-corpus of

ICLE. In addition, two projects, the Longitudinal Database of Learner English

(LONGDALE) and the Varieties of English for Specific Purposes database (VESPA),

both derived from ICLE, were launched in 2008. The LONGDALE project collects

or academic purposes. Besides these projects, the CECL is also directing a project on

phraseology and discourse. Other learner corpora include the Cambridge Learner

Corpus (CLC) and the Longman Learners’ Corpus (LLC). The CLC is a collection of

Cambridge ESOL exams by the Cambridge University Press. Also compiled on a

commercial basis, the LLC comprises 10 million words and is used mainly to inform

the content of textbooks. CLC and LLC are not publicly available whereas ICLE is,

thus ICLE is more advantageous to researchers.

Whereas learner corpora can contribute substantially to the understanding of LL,

they have some limitations. Learner corpora are deficient in providing information on

learners’ receptive ability, motivation and reaction to certain teaching methods; in

addition, they are particularly criticised for their inability to discover what does not

exist in the LL (Nesselhauf, 2004:131-132).

Despite these deficiencies, learner corpora can provide much evidence in describing

languages. Studies based on learner corpora have probed the special characteristics of

LL at many levels. Some researchers have examined grammar; some are interested in

lexis and phraseology; others are attracted by discourse and stylistics issues. Granger

and Arts (1998), for example, explore tag sequences in LL. In terms of lexis, Ringbom

lexical choices; Granger and Rayson (1998) also analyse learners’ patterns of

grammatical categories. At the stylistics and discourse levels, learner language is found

to lack lexical variation and have a tendency towards overstatement and wordiness

(Lorenz, 1998:64). In the area of phraseology, De Cock et al. (1998) observe prefabs

such as two-word combinations and vagueness expressions. All of these studies have

attempted to approach LL using CLC techniques.

In document The phraseology of phrasal verbs in English: a corpus study of the language of Chinese learners and native English writers (Page 87-92)