• No results found

CHAPTER 3: METHODOLOGY

3.6 CAF Measures

3.6.2 Fluency measure

For fluency of language production, the total number of words in each essay was used as the measure. Since the essay writing is only limited to 30 minutes, the total number of words generated in the given time is a good measure for fluency. Text length is a main fluency measure that has been used in L2 writing studies (Wolfe-Quintero, et al., 1998), and is more valid than the measures of total number of T-units or clauses (Polio, 2001). Essay length was obtained through word count in Microsoft Word.

3.6.3 Lexical complexity measures

Lexical complexity was captured through the sub-constructs of lexical diversity, lexical sophistication, and lexical density. To measure lexical diversity, the vocd D measure (Malvern, et al., 2004) from the Computerized Language Analysis (CLAN) programs of the CHILDES project (MacWhinney, 2000) was used. Unlike many lexical diversity measures such as type- token ratio, vocd D measure is less sensitive to text length since it is based on 100 times of random sampling for each of the 35-word up to 50-word text portions in a text, and it is reported to be reliable for text lengths with ranges of 100-400, 200-500, 250-666, and 400-1000

(McCarthy & Jarvis, 2007). This measure has been widely used in L1 studies and is finding its popularity in L2 studies. The vocd D measure suits the essay data for this study well, since all but two essays collected fall within the range of 100-400 words.

For lexical sophistication, the proportion of sophisticated word types in each essay was used as the measure. The concept of sophisticated words is based on Laufer and Nation (1995) in which words beyond the most frequent 2,000 words in English are classified as more advanced, sophisticated and lower frequency words. The sophisticated words include most academic vocabulary, domain-specific words, as well as other less frequently used words. In the current

study, the most frequent 2,000 lemmas selected for use as the basic words are based on the second release of the American National Corpus (ANC; Reppen, Ide, & Suderman, 2005). Currently, the open ANC contains 15 million words of contemporary American English from written and spoken texts of all genres produced since 1990, and the second release of ANC contains 22,000,000 words from the full corpus which are annotated for lemma, part of speech and so on (ANC, 2014). To obtain the indices for the proportion of sophisticated word types in each essay, the Lexical Complexity Analyzer (Lu, 2012) was used; the software automates the measure and provides counts of total word types and sophisticated word types based on the ANC most frequent 2,000 lemmas. Following Laufer and Nation (1995), proper nouns in each essay that were not in the most frequent 2,000 lemmas were deleted from the sophisticated word types. Proper nouns in each essay were automatically identified through the RANGE program (Heatley, Nation, & Coxhead, 2002), after the essays were processed through the program in batches; with the default option, proper nouns appear in the “Types Not Found In Any List” for each essay. A list of the ANC most frequent 2,000 lemmas was obtained from Lu, the author of the Lexical Complexity Analyzer. In addition to the proper nouns, for each essay, prompt words that were not in the most frequent 2,000 lemmas were also deleted from the sophisticated word types. There were a total of four prompt words that were not in the most frequent 2,000 lemmas:

assignment in the narrative prompt, efficiency and disagree in the argumentative prompt, and

underdeveloped in the lower topic familiarity prompt. For each essay, the number of word types for the relevant proper nouns and prompt words was subtracted from the number of sophisticated word types in each essay, and the resulting number was then divided by the total number of word types in the essay to generate the final index of the proportion of sophisticated word types.

Finally, the lexical density measure–the number of lexical words out of the total number of words in a text, was used, and indices for the measure were obtained from the Lexical

Complexity Analyzer (Lu, 2012), which automates the measure. Lexical words are contrasted with functional or grammatical words such as articles, prepositions and pronouns. Although such a contrast is largely accepted, the previous literature defines and counts lexical words with certain variability. In Lu (2012), lexical words include “nouns, adjectives, verbs (excluding modal verbs, auxiliary verbs, “be,” and “have”), and adverbs with an adjectival base …” (p. 192).

For all the lexical complexity measures used in the current study, based on the procedure in Laufer and Nation (1995), for each essay, all spelling errors were corrected before indices were obtained from the computer programs.

3.6.4 Syntactic complexity measures

Eight different measures were used for syntactic complexity (SC), representing different dimensions of the multi-dimensional construct (Norris & Ortega, 2009). The eight measures used were: two global SC measures–mean length of sentence (MLS) and mean length of T-unit

(MLTU), one clausal coordination measure–T units per sentence (TU/S), one measure tapping into overall clause complexity–mean length of clause (MLC), two subordination measures–finite dependent clauses per T-unit (DC/TU) and nonfinite elements per clause (NFE/C), one phrasal coordination measure–coordinate phrase per verb phrase (CP/VP), and one noun-phrase complexity measure–complex noun phrases per verb phrase (CNP/VP). The definitions of the eight measures and the sub-constructs they represent are summarized in Table 3.6 adapted from Yang, Lu, and Weigle (under revision).

Table 3.6

Syntactic Complexity Measures Used

Sub-construct Measure Definition

Overall sentence complexity Mean length of sentence (MLS)

Number of words divided by number of sentences

Clausal coordination T-units per sentence

(TU/S)

Number of T-units divided by number of sentences

Overall T-unit complexity Mean length of T-unit (MLTU)

Number of words divided by number of T-units

Clausal subordination Dependent clauses per T-

unit (DC/TU)

Number of dependent clauses divided by number of T-units

Overall clause complexity Mean length of clause (MLC)

Number of words divided by number of clauses

Phrasal coordination Coordinate phrases per

clause (CP/VP)

Number of coordinate phrases divided by number of verb phrases

Noun phrase complexity Complex NPs per verb

phrase (CNP/VP)

Number of complex NPs divided by number of verb phrases

Non-finite

elements/subordination

Non-finite elements per clause (NFE/C)

Number of non-finite elements divided by number of clauses

Figure 1 below, adapted from Yang, Lu, and Weigle (under revision), graphically shows the hierarchical relationships among the SC sub-constructs and their measures and how they represent SC as a multi-dimensional construct. The current study, following the definitions given in grammar theories (Cristofaro, 2003; Givon, 2008; Halliday & Matthiessen, 2004; Langacker, 2008), regards both finite and nonfinite dependent structures as subordination and thus examines both DC/TU and NFE/C. Although previous writing studies have frequently examined the amount of finite subordination, non-finite subordination has not received due attention. Further, following the existing writing literature, clause in this study refers to only finite clauses (see Hunt, 1965; Lu, 2011; Norris & Ortega, 2009; Polio, 1997), and nonfinite structures are thus referred to as non-finite elements.

Figure 3.1. A Multi-dimensional Representation of Syntactic Complexity

Indices for the eight measures of SC for each essay were automated with a computational tool–L2 Syntactic Complexity Analyzer (L2SCA) (Lu, 2010), with some necessary minor

adaptations as described below. Explicit definitions of the linguistic units relevant to this study and automated in L2SCA –sentence, T-unit, clause, dependent clause, coordinate phrase, and verb phrase–can be found in Lu (2010, 2011). The original version of L2SCA provides frequency counts of the above linguistic units and other linguistic units and generates 14

different SC indices for a given text. The original version of L2SCA generated indices for MLS, MLTU, T/S, DC/TU, and MLC for each essay in this study. CP/VP was calculated by dividing

Overall Sentence Complexity (Mean Length of Sentence: MLS)

Clausal Coordination (T-units per Sentence: TU/S)

Clausal Subordination (Finite) (Dependent Clauses per T-Unit: DC/TU) Overall T-unit Complexity

(Mean Length of T-unit: MLTU)

Overall Clause Complexity (Mean Length of Clause: MLC)

Phrasal Coordination

(Coordinate Phrases per Verb Phrase: CP/VP)

Non-finite Elements/Subordination (Non-finite Elements per Clause: NFE/C) Noun-Phrase Complexity

the frequency counts of coordinate phrases (CP) by the frequency counts of verb phrases (VP). Complex noun phrases (CNP) are defined in this study as noun phrases that contain one or more of the following: pre-modifying adjectives, post-modifying prepositional phrases, and post- modifying appositives (see, e.g., Biber, Gray, & Poonpon, 2011). For this study and for Yang, Lu, and Weigle (under revision), Lu, the author of L2SCA, used the pattern for identifying complex nominals in the original L2SCA to modify accordingly to match this definition of complex noun phrases in order to automate frequency counts of CNP. CNP counts were then divided by VP counts to generate CNP/VP for each essay. Finally, L2SCA calculates verb phrases per clause (VP/C) but not non-finite elements per clause. The number of non-finite elements per clause was computed by subtracting 1 from VP/C. This was done because in

L2SCA, each clause contains one finite VP, and the other VPs are therefore non-finite. As for the reliability of L2SCA, Lu (2010) reports that the tool is highly reliable in the production-unit frequency counts and the SC indices generated for college-level ESL writing at the intermediate and high proficiency levels, based on an analysis of and comparison with human coding of sample essays produced by Chinese college-level EFL writers.