• No results found

4 Approaches to defining constructs for language tests

4.5 Theoretical models of language ability for testing purposes

4.5.1 Componential models

Componential models make a distinction between various components in language ability to describe it conceptually. The value of these models for test development is that the salient components they identify can be used as guiding principles in the construction of comprehensive assessment tests and tasks. It is likely that only the parts of a generic model which are relevant for a particular assessment situation will be implemented in any single test, but the comprehensiveness of a componential model can nevertheless support the systematicity of the planning of a test. The model could also be used as a quality criterion for how well a particular test covers whichever areas are relevant for the purpose for which the scores are being used while indicating the areas that are not covered by the test.

The best-known componential model of language ability in language testing is Bachman and Palmer’s (Bachman 1990, 1991, Bachman and Palmer 1996) model of Communicative Language Ability (CLA). The CLA model identifies the characteristics of an individual that are engaged

when he or she uses language. These are language knowledge, topical knowledge, and personal characteristics, mediated through, and interacting with, affective factors and strategic competence (Bachman and Palmer 1996:62-63).

Language knowledge in the Bachman and Palmer model is divided into organizational knowledge, which consists of grammatical and textual knowledge, and pragmatic knowledge, which subsumes illocutionary and sociolinguistic knowledge. Each of the components has further subdivisions. Among the other knowledges which are relevant for language use according to the CLA model, topical knowledge comprises the knowledge about the topic that the individual brings to an interactional situation, while personal characteristics are basic features of the person such as sex, age, and native language. Affective factors embody emotional responses to the communication situation, while strategic competence comprises metacognitive organisation and monitoring of the communication situation.

Furthermore, Bachman and Palmer maintain that the nature of language ability must be considered “in an interactive context of language use” (1996:62) rather than solely on the basis of the characteristics of an individual. Thus it is clearly an interactionalist theory. Bachman and Palmer (1996) propose a checklist for the description of tasks that guides a test developer through a close description of the setting and the language characteristics of the task. It includes the setting of language use in terms of physical characteristics, participants, and time of task and a close linguistic description of the characteristics of the test rubrics, task input (or task material), expected examinee response, and relationship between input and expected response (1996:49-50). This implements an analysis of the contextual factors that Bachman and Palmer consider relevant for the modelling of language skills in tests. The authors also promote the use of a similar checklist to analyse which aspects of language ability the test covers (Bachman and Palmer 1996:76-77).

Bachman (1990:81) points out that the CLA model builds on earlier work on communicative competence by Hymes (1972), Munby (1978), Canale and Swain (1980), Canale (1983), and Savignon (1983). Similarly to its predecessors, the CLA model tries to describe the nature of language ability comprehensively and in general terms. The CLA model is thus grounded in theoretical thinking, but the model has also been influenced by empirical results from a multitrait-multimethod study (Bachman and Palmer 1982). A hypothesized model with three traits was investigated, ie.. linguistic competence, pragmatic competence, and sociolinguistic

competence, using oral interview, writing sample, multiple choice, and self-rating as methods. The findings supported a partially divisible model in which sociolinguistic competence was separate from the other two traits. The findings also indicated relatively strong method effects, which is reflected in the current importance of task characteristics in the CLA framework. However, whereas the earlier test method characteristics were considered undesirable influences on scores, the broader range of task characteristics and the inclusion of textual and discourse features indicates a different attitude to contextual factors in the 1996 version of the theory. The only area that is not particularly far developed in the CLA model in relation to Chapelle’s (1998:52) model reproduced in Figure 1 is the area of fundamental processes. In their 1996 book, Bachman and Palmer (1996:62) specify that the CLA model is “not … a working model of language processing, but rather … a conceptual basis for organising our thinking about the test development process.”

For test development purposes, the complexity and detail of Bachman and Palmer’s model yields checklists to characterise the nature of the test and a guideline to develop assessment criteria. To describe the setting of language use, the test developer is encouraged to describe the physical characteristics of the language use situation, eg. location, noise level, and lighting; the participants in their roles, eg. teachers, classmates, friends; and the time of the task, ie.. daytime, evenings, and/or weekends. Similar categorisations exist for the description of other features of the test, including the language of the task, where the test developers describe the grammatical, topical, and functional characteristics of the task material and the expected response. When parallel descriptions are developed for the test tasks and non-test tasks to which the test is supposed to be relevant, the quality of the test can be assessed, and if significant differences are found, the test developers can try to find a better test method. In other words, test developers can use these tools to state what their test tests and to judge its quality.

According to Bachman and Palmer (1996:193), the measurement process, or the process which produces the scores, consists of three steps: defining the construct theoretically, defining the construct operationally, and establishing a method for quantifying responses. The first step is accomplished by describing the construct in detail through the frameworks discussed above. The second step entails writing test blueprints and actual test tasks. The third step comprises the production of a scoring mechanism for the test. For receptive tests, this means defining criteria for correctness and deciding whether binary 0/1 scoring or partial credit scoring is to be

used. For speaking and writing, this involves the creation of the rating scales. For both procedures, furthermore, the test developers need to decide whether to report the test results as they come from the assessment process or whether some score conversion and combination is to be used.

As for the assessment scales, Bachman and Palmer (1996) argue for the use of analytic scales of a specific type, which they call “criterion- referenced ability-based analytic scales” (1996:213). The scale categories are derived from their componential view of language ability and the scale levels are defined in terms of quantity, from ‘no evidence of’ to ‘evidence of complete knowledge of’ whatever category is in question (1996:211). The scale for knowledge of syntax (Bachman and Palmer 1996:214), for instance, ranges from “no evidence of knowledge of syntax” through “evidence of moderate knowledge of syntax” with “medium” range and “moderate to good accuracy within range” where, “if [the] test taker attempts structures outside of the controlled range, accuracy may be poor” to “evidence of complete knowledge of syntax” with “no evidence of restrictions in range” and “evidence of complete control except for slips of the tongue”.

Bachman and Palmer (1996:211-212) propose this type of assessment scales in contrast to global scales of language ability and argue that their approach has two advantages. Through analytic scales, testers can indicate the test taker’s strengths and weaknesses, and such profile scoring reflects what raters actually do when they rate since the features which raters take into account when rating are expressed separately in their scales. Compared with global scales, these are indeed the advantages, but compared with other analytic scales, Bachman and Palmer’s approach is quite abstract. The scale is defined without any reference to actual language use situations and the level descriptors include no examples of learner language. The authors state that the introduction to the scale should include definitions of “the specific features of the language sample to be rated with the scale” (p. 213). However, the authors’ example of this in the context of the grammar example is as abstract as the level descriptors: “evidence of accurate use of a variety of syntactic structures as demonstrated in the context of the specific tasks (as specified in the task specifications) that have been presented” (Bachman and Palmer 1996:214).

Bachman and Palmer’s scale definitions cohere well with their framework for test construction and their model of communicative language ability because the same categories are used. However, from the point of view of a test development board they raise two concerns. The first is conceptual: the view of language learning that these scales implement

seems to build solely on quantitative increase, which test developers may find difficult to accept and certainly difficult to apply in giving detailed feedback. Such issues can only be resolved through empirical investigation, and Pavlou’s (1995) study provides one example. He applied the Bachman and Palmer scale for register, which posits that ability develops from no register variation through a good control of one register and an inconsistent control of another, to a consistent control of a range of registers. Pavlou’s analysis of learner performances at different levels indicated that the important variable was not the number of registers commanded but the appropriacy and consistency of choice of register for the task setting. If such a modification to the scale were made, it would formalise the effect of contextual factors in performance ratings. It is possible that Bachman and Palmer (1996) did not consider research to be far enough advanced yet to allow such modification. A continued dialogue about different kinds of scales applied to different tests, performances and ability levels might clarify whether scale differences are due to different views of language and ultimately unsolvable through empirical evidence, or whether a consensus could be reached about a practical application of the concept of register to an assessment scale in the context of an interactional theory of language.

The second concern that Bachman and Palmer’s scales raise for test developers is a practical worry that scale descriptors which only include brief quantitative phrases, eg. between a “small”, “medium”, and “large” range of grammatical structures, is too abstract to support agreement between dozens of raters who may be working on their own after initial training. Similarly to the previous worry, this criticism should be substantiated through empirical evidence. An ideal start for such evidence would be for a large-scale examination to implement scales of the Bachman and Palmer type; otherwise, the effort of using two parallel assessment systems might be too demanding in practical terms.

Davies (1996) observes that while language testers often refer to the Bachman and Palmer model, they tend to acknowledge rather than apply it. Chalhoub-Deville (1997:13-14) makes the same observation in slightly more positive terms, contending that the Bachman and Palmer model, like other theoretical models of language ability, can be used to express the extent to which a contextualised assessment instrument covers a context- neutral model of general language ability. In other words, the model is too comprehensive and possibly too abstract to be implemented in its entirety. I will return to Chalhoub-Deville’s discussion of theoretical models versus contextualised assessment frameworks later in this chapter.

Extrapolating from the discussion around the Bachman and Palmer model, it seems that componential models of language ability can support detailed construct description and creation of a coherent examination where the theoretical construct definition, the operational definition in task specifications, and the measurement definition in assessment principles all go together. The model does not pose rules for the degree of detail that test developers should use to describe the construct; it offers a support structure that developers can use if they so decide. The model does not require that the construct be made the driving rationale for test development and validation, but it enables test developers to do this if they wish. The basis of generalization that componential models offer for test scores builds on the model’s components. In the case of the CLA, these are the language learner’s syntactic, textual, functional, and pragmatic knowledges, combined with their personal characteristics and world knowledge and mediated through their affective response and strategic competence. The authors emphasize that it is important to consider these constructs in the context of language use, which they define through task characteristics. These are the important constructs in language testing, according to this model.