6
aside for cohesive groups to really take items and specifications apart in critical discus-sions. The purpose is to ensure that only robust items emerge from the process, for which there is wide agreement that the item type will not only work, but that it will elicit a response that provides valuable information on the construct of interest. Initial evaluation is undertaken by the test developers themselves, usually with help from other applied linguists or teachers with a range of experience in teaching and assessing.
In order to illustrate this I am going to use an example from a real test specifica-tion workshop, conducted for Oxford University Press, similar to the one described in Fulcher and Davidson (2007: 316–317). The context is the development of a computer-delivered placement test. The project brought together more than twenty experienced teachers and item writers. The workshop was divided into a number of stages, as follows:
Stage 1 Groups of teachers are formed and engage in an ice-breaking activity.
Stage 2 Review test constructs and create task/item specifications with sample items.
Stage 3 Groups swap sample items but not specifications. Each group attempts to reverse engineer the sample item from the other group.
Stage 4 Groups are given the original task/item specification and asked to critique the sample item in preparation for giving feedback to the group that designed the item.
Stage 5 Plenary session in which each group receives feedback on specifications and items, then responds to the critique.
Two concepts are critical to this process. The first is reverse engineering, and the second is item–spec congruence (or item–spec fit). We have already encountered reverse engin-eering in previous chapters; as a group evaluation technique it is a very powerful tool.
Although there are different types of reverse engineering (see Fulcher and Davidson, 2007: 57), the most common is critical reverse engineering, in which we take a sample item and analyse it to ask what it is testing, whether it is a useful item, and consider what problems we might face if we use the item. The outcome may be to revise an item, or to abandon it completely. Item–spec congruence is particularly relevant to Stage 4. Here, a group sees the original item specification and checks to see whether the item could reasonably have been generated from the specification, and whether they have been able to reverse engineer the general description. The group has to consider whether an item and its specification are both congruent and useful.
We will begin by considering Stage 2 briefly. The participants had been asked to consult a range of sources on listening and reading constructs. These sources included books and articles, models like those we have discussed in Chapter 5, including the CEFR and the Canadian Language Benchmarks. In Stage 2 the groups focused on which constructs would be most relevant for a placement test to be used in a language school that is following a particular syllabus with an associated set of materials. Many con-structs were selected and agreed upon; one of these was ‘ability to identify facts in short, clear, simple messages and announcements’. One of the groups was given the task of designing a listening item type to test this construct.
Evaluating items, tasks and specifications 161
Here is the item that was produced by the design group. Remember that this item is to be presented on a computer, so answers require the manipulation of the mouse and keyboard.
Tapescript (from the teaching materials) Woman: Yes?
Man: I’d like some information about the rock concert tonight.
Woman: Certainly? How can I help?
Man: Where is it on?
Woman: At the Regent Theatre in Bank Street.
Man: What time does it start?
Woman: At seven-thirty.
Man: And how much are the tickets?
Woman: Well, the ten-euro tickets are all sold – the only ones we have left are fif-teen euros.
Man: That’s fine – I’ll have those.
Woman: How many would you like?
Man: Four, please.
Woman: We have four in the front row or in the middle of the theatre.
Man: I’ll take the ones in the middle, please. The front row will be too close to the stage.
We would make this an answerphone message or recorded announcement covering the message.
Click on the word or number which is not correct on each line. Key in the correct information in 1 to 6 below. You will hear the recording twice. You can key in your answer at any time. Once you have heard the recording, you will have 60 seconds to fill in your answer.
City Ticket Agency
0 Event Jazz concert
1 Place Regent Cinema
2 Address Bank Road
3 Time 8.30
Tickets bought
4 Price 10 euros
5 Number 5
6 Seat(s) row front
0 rock
1 2
3 4 5 6
You have already been told what this item is supposed to test. Remember that the evalu-ation group only had access to the item and nothing else during Stage 3 of the workshop.
Before we move on to discuss Stage 3, you may wish to spend some time writing your own critique of this sample item. You can then compare your own views with those of the group.
Rather than simply listing a set of questions or criticisms that came out of Stage 3, I am going to present the transcript of the discussion with annotations. The reason for this, following Davidson and Lynch (2002), is that item development and review must be seen as a collaborative group activity. Individuals do not always see problems with items or tests. The problems and solutions emerge in discussion and debate, and good specifications evolve in the process. The following transcript is not exact. I have not recorded all overlapping speech, or attempted to transcribe hesitations, false starts, and so on. At points the discussion drifted from topic, and I have removed those sections that were not directly relevant. Nevertheless, the transcript does accurately reflect what was said in the workshop. As you read through the transcript, consider what ideas are being generated, where agreement starts to form, and where disagreements remain. At various places there are observations within text boxes to bring out salient points.
For ease of reading, our four participants in the discussion are Angela, Bill, Carol and Dave, although these are not their real names. We join them in Stage 3, in which they have been asked to reverse engineer the specification for the task.
Angela: We would make this an answerphone recorded message covering the conver-sation the information
Bill: what’s it say?
Carol: Oh they’ve taken that dialogue, that’s interesting, isn’t it?
Angela: we would make this an answerphone recorded message covering the information
Bill: okay, yeah
Angela: click on the word or number which is not correct
Bill: just one that’s
not correct
Angela: on each line
Bill: oh okay
Dave: hm hm and the lines are
Evaluating items, tasks and specifications 163
Angela: Key in the correct information in one to six below
Dave: so we need an input box as well
Carol: alright then we need to see this really
Angela: click on the word or number which is
not correct
Bill: okay so I think what you have to do is well that’s quite compli-cated isn’t it? I mean technically. But I imagine what you’re doing is in each line you have to kind of like select a word that’s not right so jazz is wrong it’s rock and you have to select ‘regent’ and change it to ‘odeon’ or something.
That’s what it is isn’t it?
Carol: key in the correct information in one to six so alright
Bill: so the next one is
theatre not cinema
Carol: so you click on that and then put in what it should be here.
You will hear the recording twice. You can key in your answer at any time.
Once you have heard the recording you will have sixty seconds to fill in your answers. It’s very complicated.
Bill: It’s very complicated it’s very complicated to achieve as well
Carol: why
doesn’t it it’s more than yeah what it’s doing is actually very simple isn’t it? It’s just correct the answers
Bill: yeah
Carol: what’s the point of clicking on it and then typing the thing in why can’t you just
Angela: well exactly it seems to be a very
long-winded way of just selecting the answers
Carol: you’ve actually got to hear it without
any prompt haven’t you? So is it notes? Can I just have a look is it is it erm so presumably you get a few seconds to read it through the city ticket agency jazz concert I’d like some information about the rock concert tonight you select jazz
Bill: technically it’s quite difficult isn’t it because you’ve got to have selectable text and you’ve got to have
Carol: and something that you can key
Bill: key in boxes it’s technically quite hard
Carol: hm it seems uneconomical
Bill: procedurally very
tough yeah
Carol: for what you’re getting out of it
Angela: so why don’t you just give them
two words and they click on the right one?
Carol: hmmm
Bill: yes exactly yes
Dave: so you’ve got rock jazz and you just click on one of these
Angela: just click on it yes
Looking at a dialogue of an item review is fascinating from many points of view. In the preceding section the lexical cohesion between turns and the ways in which members contribute to building consensus is particularly interesting. You may wish to mark the text to show how this happens; for example, by highlighting each use of the words ‘complicated’, ‘complex’, ‘difficult’, ‘hard’ and ‘simple’. In the opening discussion the group has not attempted to identify the intended construct. Rather, they are clearly having difficulty understanding just what it is the test taker has to do in response to the item. In focusing on the response attribute they also become involved in the delivery specification. It seems to them that just producing this item in a computer environment is likely to be difficult. But the difficulty is not just in the technicalities, it is also in the ‘procedure’ that the test takers are being asked to follow, as Bill makes clear.
We rejoin the discussion, where it takes a very interesting turn.
Bill: because you wouldn’t have to write it out you might
Angela: because you would have a difficulty here you’ve got the spelling
Dave: so do you
think then they’ve made this a spelling test? If they’re not giving you the answers written then it’s also a spelling test
Angela: yes but it’s also a hearing test
Dave: yes but it’s also but they are giving you the answers I think they are giving you the answers down the bottom
Carol: are they
Dave: one two three
four five six will have alternatives won’t they
Bill: no they’re writing boxes they’re
empty fields
Angela: you have sixty seconds to write in
Carol: where do so so you have to
remember what they were I mean you can write this down or
Angela: it’s very confusing
Carol: because you
may know what’s wrong but then you’ve got to remember what’s right and then you put the answers in and you can put them in any time
Angela: shall we try it? I’ll read it and
you try doing it
Carol: yeah
Evaluating items, tasks and specifications 165
Angela: first of all you have to identify I suppose you mark on there
Carol: I suppose you identify it on the first
listening and answer on the second I think
At this point Angela attempts to read out the prompt and the dialogue while the others attempt to answer the item. However, we note that in this short period of discussion Angela and Carol have raised two serious problems with the item as their understanding of it develops. The first is that by getting the students to type the correct response into a box the item may be testing spelling. This is a computer-based test, and the computer will score the answers. But the item is clearly a listening item – the group appears to agree on this even though it hasn’t been explicitly stated. The second issue is more subtle. Carol has seen that typing in the correct answer can only occur after listening to the text all the way through. This means that the test takers must remember the correct answer for each incorrect word. The implication of this observation is that the item is likely to be sensitive to short-term memory capacity, and the test is not meant to be a memory test. The group has identified two potential threats to score meaning from construct irrelevant variance.
We rejoin the conversation after the group has had the opportunity to do a ‘try-out’ of the item.
Carol: It’s all very complicated
Angela: I think it’s fine but I think it just needs simplifying
Bill: It’s a
variation on a blank fill really isn’t it
Angela: yes
Bill: where you’ve got to identify which
blanks which blanks to fill in
Dave: so general description this would be identifying specific information or is there a special phrase or
Angela: listening for specific
information but it’s correcting wrong information
Bill: yes correcting year or just correcting information Carol: they’re not even similar sounding words are they they’re just totally
differ-ent things it’s not like
Dave: but are we supposed to use this tapescript here?
Bill: yes but
Carol: I’d like some
information about the rock concert tonight
Angela: rock
Carol: jazz
Angela: so then you have to what do you do then?
Dave: well they say they’re going to make it a monologue aren’t they it’s going to be an answerphone message okay so they take that information about the concert and say ‘hi this is Bob. I wonder if you want to come to the jazz concert tonight it’s at the Leicester Square cinema’
Carol: Hang on a minute it’s an
answerphone it’s a recorded answerphone message isn’t it
Angela: trouble is an illustrative item or task reflects the specification it’s difficult to do a listening without the script isn’t it
Bill: well we can imagine it’s not hard to
imagine from that ‘yes hello this is the regent cinema tonight’s concert is a rock concert which starts at eight fifteen and the tickets are eleven euros fifty the ten euro tickets are all sold’
Angela: So it’s the ability to identify the wrong information and replace it with the correct information
In this section it is interesting to see that members of the group have made very different assumptions about what the answerphone message is. Dave assumes that it is an invitation left on the answerphone of the test taker, while Bill’s explanation is that it is a recorded message from the cinema. This appears to be ambiguous in the sample item because the designers have presented a dialogue and not rendered it in the genre required. Despite this serious problem, the group appear to agree about what the item is designed to test. Bill’s interpretation and the conclusion summarised by Angela at the end of this section show that they have managed to discern the intentions of the item designers.
The discussion drifts back to whether this item can really be called a ‘gap fill’ or a
‘cloze’, and they decide that it doesn’t really fit into either category. We return to the discussion as the group begins to look at the prompt attribute.
Angela: Where does that leave us then? Prompt attributes. So you need a recorded answering machine message of around you want a word limit
Dave: fifty words
would be more than enough
Angela: in order to extract information thirty seconds is one hundred and ten words in listening
Dave: yes yes you’re right
Angela: so one hundred and ten to one
hundred and twenty or something it’s difficult to get the exact number of words but you need a kind of parameter and that would be a short snippety listening
Bill: and then the
students correct the notes
Evaluating items, tasks and specifications 167
Angela: no I think you need to say students read the text and then listen
Dave: no they read the notes
Angela: well notes yeah the information in the
task they need a ten second period they read that and they
Carol: listen
Angela: to the recorded
mes-sage presumably as many times as they like
Carol: twice
Angela: does it say twice?
Carol: it says twice
Angela: I mean is the first listening to do the task and the second listening to check what they’ve done or
Carol: well surely the first one is to to identify the mistake and the second one to write the answer
Angela: but don’t you want to do that together
Carol: they
can do it any time can’t they they can put it in any time
Angela: you will hear the recording
twice and key in the answer any time once you have heard the recording you will have sixty seconds
Carol: that’s very difficult because presumably you have time to identify and then write it in so you’re going to have to remember what the answers are
Dave: six
lines six questions there’s one question for each line so that makes it a bit easier so it’s not just a completely random set of notes so you know that in each line there’s one error that you have to correct
Carol: these would be better off next to
Angela: yeah they
would I think it’s a bit of a layout problem
Carol: and you’ve only got two words
to choose from haven’t you?
Dave: yeah you have
Carol: which word is wrong what should
it have been and that’s it
Bill: once you get to the second part the first part ceases to be of any relevance
Dave: does clicking on the incorrect word have any purpose?
Carol: that’s what I’m
won-dering was it just for them to be able to remember what it was
Bill: it would make
more sense to provide a task with the word underlined so they could listen for what the correct word is
It was at this point that the group was given a short break. They were then given the specification for the sample item. The specification was written using a Popham-style template, which we reproduce here.
Title of Specification Listening Correction Task General Description
Title of Specification Listening Correction Task General Description