3.2 Learner language
3.2.3 Development of learner language studies
The characteristics of learner language can be captured by various approaches. The
most noticeable feature is their errors, which are often investigated through error
analysis (James, 1998). Error analysis has perhaps had its heyday, but it also received
severe criticisms. With the blossoming of corpus linguistics, error analysis may find a
new way to grow, but the traditional method of analysing learners’ errors has become
insufficient. Another dominant classical approach is contrastive analysis, which
compares and contrasts at least two languages. The idea of contrastive analysis has
been applied by Granger to LL studies, and termed ‘Contrastive Interlanguage Analysis’
(CIA) (Granger, 1996:43), which makes a comparison between the original and target
languages and comparison between their translation equivalents. Later, Granger
modifies the CIA model into a comparison of native language with interlanguage or
different languages, i.e. (1) NL vs. IL and (2) IL vs. IL (Granger, 1998a:12). For the
former (NL vs. IL), overuses and underuses are the primary means to determine the
interlanguage differences. For the latter (IL vs. IL), different varieties of learner
68
Furthermore, Granger suggests integrating the CA and CIA to form a comprehensive
account to increase the validity of learner language research.
CIA has been adopted widely in learner corpora studies, based on a ‘Computer
Learner Corpus’ (CLC) approach, to use Granger’s term (Granger, 1998a:6). She gives
a summary of the basic features, and current analysis approaches of CLC (Granger,
2004). In her opinion, CLC distinguishes itself from other data collection types in SLA;
it has advantages in size, variability, and automation (Granger, 2004:124ff). She also
contends that the methodological framework at the heart of CLC rests mainly on CIA
(Contrastive Interlanguage Analysis) and CEA (Computer-aided Error Analysis). CLC
research can be classified according to its research design. Granger (1998a:15) notes
that CLC research can be classified into ‘hypothesis-based’ and ‘hypothesis-finding’.
She concludes that hypothesis-finding is more powerful to “gain totally new insights
into learner language” (Granger, 1998a:16). One of the study types which can benefit
from the ‘hypothesis-finding’ design is research on formulaic sequences. Granger and
other scholars have tested whether learner languages are composed of ‘individual
bricks’ or ‘prefabricated sections’ (De Cock et al., 1998:67), and investigated vague
language that occurred as some phraseological combinations (De Cock et al.,
69
with different frequencies and functions. In a more recent article (Granger, 2005b),
Granger pinpoints two mainstreams of research on phraseology. The first trend is the
interest in distinctions between less fixed multi-word units and free combinations. The
second trend concentrates on typical features, such as non-compositionality and
fixedness of the formulaic sequences. In order to complement the lack of a broader
overview on the phraseological phenomena, she ends up with a suggestion of
incorporating the statistical approach with fine-grained linguistic analysis as filters to
yield targets worthy of further investigation. This thesis thus follows this suggestion.
3.2.3.1 Corpus approaches to describing LL
With the advance of technology, corpora have been applied to inform the theories and
practice of second language acquisition. As such, in recent academic history, CLC
studies have been fruitful in describing learner language. Here I will review a few
significant learner corpora and a number of CLC studies on different aspects of learner
language.
Pravec (2002) surveyed the background information of several learner corpora.
Because of the space limitations, I will focus only on corpora of written texts. To name
70
(ICLE), the Cambridge Learner Corpus (CLC) and the Longman Learners’ Corpus
(LLC), will be described below.
The ICLE is a project created by the Centre for English Corpus Linguistics (CECL),
at Université Catholique de Louvain. The leading researchers include Sylviane Granger,
Fanny Meunier, Estelle Dagneaux, Magali Paquot and Sylvie De Cock. It is an
international project which collaborates with other researchers in many countries, such
as China, Germany, etc., containing over fourteen varieties of learner languages. A
comparable reference corpus, LOCNESS, containing both British and American
English, was built as well. All the corpora of ICLE were compiled in the same format
and designed using the same rules, in order to ensure comparability. Many studies have
been conducted examining the data from ICLE. For instance, the directing researchers
mentioned above have produced papers on many aspects of learner language (De Cock,
2000, 2001; Granger, 2005a; Meunier & Granger, 2008). Also, other researchers such
as Kaszubski (2000) have tackled the phraseological issues found in a sub-corpus of
ICLE. In addition, two projects, the Longitudinal Database of Learner English
(LONGDALE) and the Varieties of English for Specific Purposes database (VESPA),
both derived from ICLE, were launched in 2008. The LONGDALE project collects
71
or academic purposes. Besides these projects, the CECL is also directing a project on
phraseology and discourse. Other learner corpora include the Cambridge Learner
Corpus (CLC) and the Longman Learners’ Corpus (LLC). The CLC is a collection of
Cambridge ESOL exams by the Cambridge University Press. Also compiled on a
commercial basis, the LLC comprises 10 million words and is used mainly to inform
the content of textbooks. CLC and LLC are not publicly available whereas ICLE is,
thus ICLE is more advantageous to researchers.
Whereas learner corpora can contribute substantially to the understanding of LL,
they have some limitations. Learner corpora are deficient in providing information on
learners’ receptive ability, motivation and reaction to certain teaching methods; in
addition, they are particularly criticised for their inability to discover what does not
exist in the LL (Nesselhauf, 2004:131-132).
Despite these deficiencies, learner corpora can provide much evidence in describing
languages. Studies based on learner corpora have probed the special characteristics of
LL at many levels. Some researchers have examined grammar; some are interested in
lexis and phraseology; others are attracted by discourse and stylistics issues. Granger
and Arts (1998), for example, explore tag sequences in LL. In terms of lexis, Ringbom
72
lexical choices; Granger and Rayson (1998) also analyse learners’ patterns of
grammatical categories. At the stylistics and discourse levels, learner language is found
to lack lexical variation and have a tendency towards overstatement and wordiness
(Lorenz, 1998:64). In the area of phraseology, De Cock et al. (1998) observe prefabs
such as two-word combinations and vagueness expressions. All of these studies have
attempted to approach LL using CLC techniques.