Theoretical background to constructs and operationalisations

Chapter 4: Measuring Constructs and Con- Con-structing Measures

4.5 Theoretical background to constructs and operationalisations

The eight constructs constituting the taxonomy of writing are described in more detail in the following sections. In these sections the theoretical basis of the con-structs is discussed, followed by examples of research that has identified discourse analytic measures to operationalize the three different constructs. The main aim of this section is to identify measures which have successfully distinguished different proficiency levels of writing. Based on the findings of the review of the literature, a summary of suitable measures for the empirical investigation will be presented.

4.5.1 Accuracy, Fluency and Complexity

The following section discusses the theoretical basis underlying the analytic measures of accuracy, fluency and complexity. First, a theoretical framework based on an information-processing model is presented and then each measure is

described in detail. In these sections, the varying measures previous studies have employed to operationalize the different concepts are investigated.

Measures of accuracy, fluency and complexity are often used in second language acquisition research because they provide a balanced picture of learner language (Ellis & Barkhuizen, 2005). Accuracy refers to ‘freedom of error’ (Foster & Ske-han, 1996, p. 305), fluency refers to ‘the processing of language in real time’

(Schmidt, 1992, p.358) where there is ‘primacy of meaning’ (Foster & Skehan, 1996 ,p. 304) and complexity is ‘the extent to which learners produce elaborated language’ (Ellis & Barkhuizen, 2005).

In 1981, Meisel, Clahsen and Pienemann developed the Multidimensional Model of L2 acquisition, which proposed that learners differ in their orientation to learn-ing and that this influences their progress in different areas of L2 knowledge.

Learners with a ‘segregative orientation’, for example, are likely to achieve func-tional ability at the expense of complexity and possibly accuracy. In contrast, learners with an ‘integrative orientation’ may prioritize accuracy and complexity at the expense of fluency. Underlying Meisel et al.’s model is the assumption that L2 learners might experience difficulty in respect to focussing on content and form simultaneously and therefore need to choose what to pay attention to.

A possible explanation of this phenomenon lies in theories of second language acquisition that propose a limited processing capacity (e.g. Skehan, 1998b). In the case of output, or more specifically writing, which is the focus of this study, learners need to access both world knowledge and L2 knowledge from their long-term memories and hold these in their short-long-term memories in order to construct messages that represent the meaning they intend and which are at the same time linguistically appropriate.

Focussing more closely on L2 knowledge, Skehan (1998b) proposes that it is stored in two forms, one being exemplar-based knowledge and the other one rule-based knowledge. The former consists of chunks or formulaic expressions which can be accessed relatively effortlessly and therefore are able to conserve valuable processing resources. This particular component of L2 knowledge contributes to increased production and fluency. The other component of L2 knowledge, the rule-based system, stores complex linguistic rules which allow the speaker to form an infinite number of well-formed sentences in innovative ways. But this is more costly in terms of processing capacity and this knowledge is harder to access if limited planning time is available.

Skehan (1998b) uses the model proposed above to suggest three aspects underly-ing L2 performance (see Figure 10 below). Learner production is to be analysed

with an initial partition between meaning and form. Form can further be subdi-vided into control and restructuring. Meaning is reflected in fluency, while form is either displayed in accuracy (if the learner prioritizes control) or in complexity (if opportunities for restructuring arise because the learner is taking risks).

Figure 10: Skehan’s three aspects of L2 performance (from Ellis & Barkhuizen, 2005)

Skehan (1996, p. 50) considers the possible results of learners allocating their at-tentional resources in a certain way. He argues that a focus on accuracy makes it less likely that interlanguage change will occur (production will be slow and probably consume a large part of the attentional resources). A focus on complex-ity and the process of restructuring increases the chance that new forms can be incorporated in the interlanguage system. A focus on fluency will lead to language being produced more quickly and with lower attention to producing accurate lan-guage and incorporating new forms. He proposes that as learners do not have enough processing capacity available to attend to all three aspects equally, it is important to understand the consequences of allocating resources in one direction or another. A focus on performance is likely to prioritize fluency, with restructur-ing and accuracy assigned lesser importance. A focus on development might shift the concentration to restructuring, with accuracy and fluency becoming less im-portant.

Discourse analytic measures of accuracy, fluency and complexity are based on an information-processing frame-work of L2 acquisition and are therefore appropri-ate for investigating L2 production. They have been used in a variety of studies investigating task difficulty (e.g. Iwashita, McNamara, & Elder, 2001; Skehan, 1996), and effec-tiveness of planning time (e.g. Crookes, 1989; Ellis, 1987; Ellis

& Yuan, 2004; Mehnert, 1998; Ortega, 1999; Wigglesworth, 1997) as well as the effects of different tea-ching techniques (e.g. Ishikawa, 1995).

In the context of language testing, Iwashita et al. (2001) have criticized the meas-ures of accuracy, fluency and com-plexity used in research as being too complex and time consuming to be used under operational testing conditions. They call for more practical and efficient measures of ability that are not as sensitive to varia-tions in task structure and processing condivaria-tions. In their study, they propose a rating scale based on aspects of accuracy, fluency and complexity.

Tavakoli and Skehan (2005) demonstrated the potential usefulness of discourse analytic measures of accuracy, fluency and complexity for language testing when they performed a principal components analysis on the oral dataset of their study.

The aim was to show that the dependent variables of their study were in fact dis-tinct factors. The factor analysis produced a three factor solution. The results can be seen in Table 12 below⁴.

As Table 12 shows, Factor 1 is made up of six measures (length of run, speech rate, total amount of silence, total time spent speaking, number of pauses and length of pauses).

Table 12: Factor analysis for measures of accuracy, fluency and complexity (Tavakoli and Skehan, 2005)

These measures represent what the authors refer to as the temporal aspects of flu-ency. The second factor is based on the measures of reformulations, false starts, replacements and repetitions. These measures are associated with another aspect of fluency, namely repair fluency (e.g. Skehan, 2001). The third factor has load-ings of measures of accuracy and complexity as well as length of run. This indi-cates that more accurate language was also more complex. These loadings also suggest that the measures represent the same underlying constructs, which

con-firms Skehan’s (1998b) model of task performance according to which accuracy and complexity are both aspects of form, while fluency is meaning-oriented. The results of this factor analysis are potentially useful for the field of language test-ing, especially rating scale design, as it can be shown which measures are in fact distinct entities and can therefore be represented separately on a rating scale. It is worth noting however, that the research investigated oral language use and that the results may not be applicable to written production.

In the three sections below discourse analytic measures of accuracy, fluency and complexity are examined in more detail. Definitions are given and commonly used measures are reviewed.

4.5.1.1 Accuracy

Polio (1997) reviewed several studies that employed measures of accuracy. Some studies used holistic measures in the form of a rating scale (looking at the accu-racy of syntax, morphology, vocabulary and punctuation), whilst others used more objective measures like error-free t-units⁵. Others counted the number of errors with or without classifying them.

The accuracy of writing texts has been analyzed through a number of discourse analytic measures. Usually, errors in the text are counted in some fashion. Two approaches have been developed. The first one involves focusing on whether a structural unit (e.g. clause, t-unit) is error free. Typical measures found in the lit-erature include the number of error-free t-units per total number of t-units or the number of error-free clauses per total number of clauses. For this measure, a deci-sion has to be made as to what constitutes an error. According to Wolfe-Quintero (1998), this decision might be quite subjective as it might depend on the re-searcher’s preferences or views on what constitutes an error for a certain popula-tion of students. Error-free measures of accuracy have been criticized by Bar-dovi-Harlig and Bofman (1989) for not being sufficiently discriminating because a unit with only one error is treated in the same way as a unit with more than one error. Furthermore, error-free measures do not disclose the types of errors that are involved as some might impede communication more than others. In light of these criticisms, a second approach to measuring accuracy was developed based on the number of errors in relation to a certain production unit (e.g. the number of errors per t-unit). One problem of this method is that all errors are still given the same weight. Some researchers (e.g. Homburg, 1984) have de-veloped a system of cod-ing errors accordcod-ing to gravity, but Wolfe-Quintero et al. (1998) argue that these systems are usually based on the intuitions of the researchers rather than being empirically based.

Several studies have found a relationship between the number of error-free t-units and proficiency as measured by program level (Hirano, 1991; Sharma, 1980;

Tedick, 1990), standardized test scores (Hirano, 1991), holistic ratings (Homburg, 1984; Perkins, 1980), grades (Tomita, 1990) or comparison with native speakers (Perkins & Leahy, 1980). Two studies found no relationship between error-free t-units and grades (Kawata, 1992; Perkins & Leahy, 1980). Wolfe-Quintero et al.

argue that for the number of error-free t-units to be effective, a time-limit for completing the writing task needs to be set (as was done by most studies they in-vestigated). Another measure that seems promising according to Wolfe-Quintero et al. is the number of error-free clauses. This measure has only been employed by Ishikawa (1995) to differentiate between proficiency levels. Ishikawa devel-oped this measure with the idea that her beginning students were less likely to have errors in all clauses than in t-units, because the string is likely to be shorter.

She found a significant improvement after three months of instruction.

The error-free t-unit ratio (error-free t-units per total number of t-units) or the percentage of error-free t-units has been employed by several studies to examine the relationship between this measure and proficiency. According to Wolfe-Quintero et al., twelve studies have found a significant relationship but eleven have not. Of the twelve significant studies, some investigated the relationship be-tween error-free t-units ratio and program level (Hirano, 1991; Larsen-Freeman, 1978; Larsen-Freeman & Strom, 1977), test scores (Arnaud, 1992; Hirano, 1991;

Vann, 1979) or grades (Kawata, 1992; Tomita, 1990). However, three studies re-lating to program level were not significant (Henry, 1996; Larsen-Freeman, 1983;

Tapia, 1993). Some longitudinal studies were also not able to capture a significant increase in accuracy, indicating that the percentage of error-free t-units cannot capture short-term increases over time. Another accuracy measure, error-free clause ratio (total number of error-free clauses divided by the total number of clauses) was used by only two researchers with mixed results. Ishikawa (1995) chose this measure as a smaller unit of analysis for her beginner-level learners.

She found a significant increase for one of her groups over a three month period.

Her other group and Tapia’s (1993) students all increased in this measure without showing a statistically significant difference. Another measure in this group, is errors per t-unit (total number of errors divided by the total number of t-units).

This measure has been shown to be related to holistic ratings (Flahive & Gerlach Snow, 1980; Perkins, 1980; Perkins & Leahy, 1980) but has been less successful in discriminating between program level and proficiency level (Flahive & Gerlach Snow, 1980; Homburg, 1984). Wolfe-Quintero et al. therefore argue that this might indicate that this measure does not discriminate between program level and proficiency level, but rather gives an indication of what teachers look for when making comparative judgements between learners. However, they argue that this issue needs to be examined in more detail. The last measure in this group is the

errors per clause ratio (total number of errors divided by total number of clauses). The findings were the same as those of the errors per t-unit measure, showing that these two measures are more related to holistic ratings than to pro-gram level.

4.5.1.2 Fluency

Fluency has been defined in a variety of ways. It might refer to the smoothness of writing or speech in terms of temporal aspects; it might represent the level of automatisation of psychological processes; or it might be defined in contrast to accuracy (Koponen & Riggenbach, 2000). Reflecting the multi-faceted nature of fluency, researchers have developed a number of measures to assess fluency. Ske-han (2003) has identified four groups of measures: breakdown fluency, repair flu-ency, speech/writing rate and automatisation. All these categories were developed in the context of speech rather than writing. They are however, just as applicable to the context of writing. Breakdown fluency in the context of speech is measured by silence. In the context of writing this could be measured by a break in the writ-ing process, which cannot be examined on the basis of the product alone. Repair fluency has been operationalised in the context of speech as reformulations, re-placements, false starts and repetition. For writing, this could be measured by the number of revisions (self-corrections) a writer undertakes during the composing process (Chenoweth & Hayes, 2001). Kellogg (1996) has shown that this editing process can take place at any stage during or after the writing process. Another sub-category of fluency is speech/writing rate, a temporal aspect of fluency, op-erationalised by the number of words per minute. The final sub-group is automa-tisation, measured by length of run (Skehan, 2003). Only repair fluency and tem-poral aspects of writing (writing rate) can be measured on the basis of a writing product. Furthermore, writing rate can only be established if the product was pro-duced under a time limit or if the time spent writing was recorded. That repair flu-ency and temporal aspects of fluflu-ency are separate entities has been shown by Ta-vakoli and Skehan’s (2005) factor analysis (Table 12).

In the context of writing, Chenowith and Hayes (2001) found that even within a period of only two semesters their students displayed a significant increase in writing fluency. This included an increase in burst length (automatisation), a de-crease in the frequency of revision (repair fluency), and an inde-crease in the number of words accepted and written down (writing rate).

One measure that can be used to investigate temporal aspects of fluency is the number of words which, according to Wolfe-Quintero et al. (1998), has produced rather mixed results. According to their analysis, eleven studies found a signifi-cant relationship between the number of words and writing development, while seven studies did not. However, this measure might be more reliable if it is

ap-plied to writing that has been produced under time pressure. Kennedy and Thorp (2002), who investigated the differences in writing performance at three different IELTS levels, found a difference between essays at levels 4, 6 and 8, with writers at level 4 struggling to meet the word limit. However, they also report a large amount of overlap between the levels. Cumming et al. (2005), in a more recent study focussing on the next generation TOEFL, found statistically significant dif-ferences only between essays at levels 3 and 4 (and levels 3 and 5), but no differ-ences between levels 4 and 5. The descriptive statistics indicate a slight increase in the number of words between levels 4 and 5. Another interesting measure to pursue might be the number of verbs. This measure has only been used once (Harley & King, 1989) in a study which compared native and non-native speakers and which produced significant results. However, it has never been used to differ-entiate between different proficiency levels.

No studies of the writing product have investigated repair fluency. The number of self-corrections, a measure mirroring the number of reformulations and false starts in speech, might be a worthwhile measure to pursue in this study.

4.5.1.3 Complexity

The importance of grammatical and lexical complexity in academic writing has been pointed out by Hinkel (2003), who argues that investigations into L2 texts have shown that in large-scale testing and university-level assessments, shortcom-ings in syntactic and lexical complexity in students’ writing are often considered a severe handicap. According to her, research has shown that raters often criticize simple constructions and an unsophisticated lexicon, a consideration that might reduce the score awarded (Reid, 1993; Vaughan, 1991). Furthermore, L2 writers’

range and sophistication have been shown to be reliable predictors of overall Test of Written English scores (Frase, Faletti, Ginther, & Grant, 1999).

Ellis and Barkhuizen (2005) suggest that complexity can be analysed according to the language aspects they relate to. These could include interactional, proposi-tional, funcproposi-tional, grammatical or lexical aspects. As propositional and functional complexity are hard to operationalize and interactional complexity is a feature of speech, only grammatical and lexical complexity will be considered here (following Wolfe-Quintero et al., 1998).

4.5.1.3.1 Grammatical complexity

Grammatical complexity is concerned with grammatical variation and sophistica-tion. It is therefore not important how many production units (like clauses or t-units) are present in a piece of writing, but rather how complex these are.

The measures that have been shown to most significantly distinguish between pro-ficiency levels, according to Wolfe-Quintero et al. (1998), seem to be the t-unit complexity ratio, the dependent clause per clause ratio and the dependent clause per t-unit ratio (with the last two producing rather mixed results in previous stud-ies).

The t-unit complexity ratio (number of clauses per t-units) was first used by Hunt (1965). A t-unit contains one independent clause plus any number of other clauses (including adverbial, adjectival and nominal clauses). Therefore, a t-unit complex-ity ratio of two would mean that on average each t-unit consists of one independ-ent clause plus one other clause. Wolfe-Quintero et al. (1998) point out that in L2 writing not all sentences are marked for tense or have subjects. They argue that it is therefore important to include all finite and non-finite verb phrases in the t-unit (as was done by Bardovi-Harlig & Bofman, 1989). This would change the meas-ure to a verb phrases per t-unit measmeas-ure. They argue that it would be useful to compare which of these measures is more revealing. The t-unit complexity ratio was designed to measure grammatical complexity, assuming that in more complex writing there are more clauses per t-unit. However, in second language research, there have been mixed results. Hirano (1991) found a significant relationship be-tween the t-unit complexity ratio and program level, as did Cooper (1976) and Monroe (1975) between this measure and school level, and Flahive and Snow (1980) found a relationship between this measure and a number of their program levels. However other studies (Bardovi-Harlig & Bofman, 1989; Ishikawa, 1995;

Perkins, 1980; Sharma, 1980) obtained no significant results. For example, Cum-ming et al.’s (2005) detailed analysis of TOEFL essays resulted in a similar num-ber of clauses across proficiency levels. The means ranged from 1.5 to 1.8 for the

In document and Evaluation Ute Knoch Diagnostic Writing Assessment PETER LANG The Development and Validation of a Rating Scale LTE 17 Ute Knoch LANG (Page 81-109)