• No results found

(ii) ’Action* items (partial) (iii) ’Action* items (completion)

as described above. (The term ’action* was used as most of the initial items of this type were concerned with ’what would you do in X circum­ stances?* and the label has stuck.) Certain aspects of the types of item selected above are clearly arbitrary. One may ask "why four out of ten and not six out of thirteen?” or ’’why 5-answer multiple choice and not 4-answer?” . This type of question has no real response. The selection was arbitrary, but based upon judgemental assessments of how much of the page should be covered with text - too many options may increase the visual complexity of the task - and what might take the ’st i n g ’ out of the questions. The passages to read are novel and, if one is also to introduce novel item-types, one has a responsibility to ensure the pupil has as much chance to answer the question as possible without extraneous distractions. One runs the risk, otherwise, of asking questions more difficult than the text.

Item Analysis

Even before constructing test items, it is necessary to know how they are to be analysed and what restrictions, if any, the form of analysis might impose on item construction. One must select from the techniques available those best suited to the task: although a number of classical test statistics are not appropriate for use in criterion-referenced measurement, some undoubtedly can contribute, and a number of other techniques can also be used or devised.

Classical procedures involve, in particular, the investigation of the difficulty and the discriminative ability of each item and each distractor, typically involving point - biserial correlations. "In practical test construction, the variability of test scores is increased by manipulating the difficulty levels and content of the test items " (Glaser, 1963). As suggested above, non-discriminating items are those that fail to contribute to variability in the total test scores of testees, and this is usually due to the item being

too easy or too hard or to it being ambiguous in some way. The

question of ambiguity will still persist in CRM - although testee responses to ambiguous questions might perform useful diagnostic functions - but the difficulty of an item is far less important. If the item reflects some important aspect of the criterion, then it is a ’good* item, regardless of its ease or difficulty. For practical purposes, however, the test constructor will be less than happy if testees always get all of his items correct. It removes the point of testing in the first place: he must either shift his criterion or give up testing.

What, however, if an item is correctly answered more often by those with lower total scores than by those with higher total scores, i.e. a negative discriminator? Clearly, this item will be unacceptable in CRM also: in effect it is contributing to the probability of false

Wedman (1974) p. 113; Smith (1974) p. 144).

Despite problems in the calculation of correlational values as associated with low variability (see Chapter 4 above, or Lord &

Novick, (1968) p. 129), discrimination indices to be used in the sense above, may still be calculated in CRM. Low variability will tend to decrease the value, toward zero, but it will not change the sign, and it is that in which one is most interested. Of course, ambiguity can also be indicated by non-discrimination and one must combine a measure of score variability with discrimination index when considering this aspect of item analysis.

The question of the item ‘distractors* must also be raised. As was suggested above, the role of other answer possibilities in CRM is not to lure away a certain proportion of testees each from the

correct answer possibility but to provide somewhere for the ‘non-master* to choose, to lessen guessing and to provide diagnostic information. Hence, an underused possibility is not a poor possibility, merely an aspect of quite an easy question. The selection of one answer possib­ ility rather more'often than the others and especially, more often than the correct one, may be an indicator of ambiguity - particularly in a non-discriminating item. So, whilst the calculation of discrimination indices is not called for for each answer possibility, an indication of the numbers choosing each may be valuable. Further, action-type items present very large problems in terms of answer possibilities, as each has

2

° possible combinations, where n is the number of answer boxes the calculation of discrimination indices for each - for little purpose - would not be worthwhile.

High rates of omission will also be of interest, both as an indicator of possible ambiguity and, when compared to position in a test, information on the timing and administration of the test. Item analysis for items in SOFRP then took the form of reporting as much organised data as possible for each item: discrimination indices,

response patterns, frequencies of response for each answer possibility, omission rates, etc. Clearly the amount of data manipulation involved was very large and the facilities for producing the relevant information

by the analysis of test data was written into a computerised item banking system, along with other facilities, described in detail in

Chapter

17

, below.

Item construction proceeded more on the basis of content referenced to the criterion of job-related reading task than on the basis of either analysis or computer system. In fact, the programming of that system involved a number of tortuous innovations to fit the system to the required item-types rather than the other way around. Analysis played its part, however, and items were constructed to the item-types

considered above, with each to be either correct or incorrect (one mark for each), with no partial credits for some, but not all, answer possibilities in action items. Each action item had no more than ten answer boxes and no less than three, and no more than four comprising the correct pattern. Deliberate ambiguity was avoided. Completion items were exceptions to the analysis procedure: available only to the marker to pronounce correct or incorrect (there appeared to be no gain in coding a response of some complexity for the computer system, when in

so doing the marker could then pronounce on the answer). These items

were marked as omit, incorrect or correct for the purposes of analysis. Selection or Construction of Reading Passages

Each item was to be based upon one reading passage, be it

continuous prose, prose p ’us diagram, a list, labelled diagram, a form or whatever. A very large amount of material was available from the companies and organisations visited. A certain amount of particular types of material was not available, however, in the main being the internal and confidential documentation of company offices, but also material either expensive to obtain or provided by other sources. Where this was the case, materials were either borrowed briefly or analagous materials constructed incorporating the main features.

majority of cases, rather than complete documents for reasons of space and timing per item.

Specific types of passages were sought for from the materials. For example, an apprentice deals with sets of procedural directions and

apprentice material was searched for sets of such directions. The most

representative of these was usually chosen; that is, the one containing most of the features common to all the sets available, such as numbered steps, separate headings, diagrams etc. Where no pattern was clear, a number of separate items were constructed based on several passages.

This process was repeated for all jobs and types of content. Of course, this aspect of the project started almost as soon as the coll­ ection of materials did, and a number of revisions were necessary, whilst types of passage and items were duplicated. With the development of the computerised item-banking system mentioned above, it was decided that a large number of items be produced for use in future tests or in parallel versions of the one being developed, rather than a number of items purely for one test.

There were few restrictions on the use of collected materials and most documents were considered for use, whatever their format: legal documents, handbooks, manuals of instructions, forms, stocklists, coding

forms, etc. A number, however, clearly were not, strictly speaking, reading tasks but writing tasks which involved filling in numbers with no reading associated with them. Further, other materials were rejected as being those requiring complete oral explication in the context in which they were used, by a supervisor or trainer. Other materials,

though collected, had been placed under a restriction to be used for internal purposes only. Their use in item construction was therefore limited to the comparison with other, similar, passages to select common features as discussed above.

As size varies in such materials and as photocopying is often

and diagrams transferred. Where possible, however, good originals were kept as received.

Item Construction

There will always be a tendency in item construction to use a particular passage because if is easy to write items about it, or to use a particular item type because a passage offers a nice set of answer possibilities for that type rather than another. It is a tendency to be avoided if at all possible, to avoid items with a basic mismatch in terms of content validity. Trivial items are also unclear.

Certain item types fit certain passages, according to the principles of item construction discussed above: for example, an action-type item fits a ’standards and specifications* passage, as the sort of job-related reading task with such materials often requires more than one aspect to be considered in order to be ’right*. Further, the *what would you do next?* reading task suggests multiple-choice with ’procedural d i r ections’ or ’checkpoints’ passages. Completion items lend themselves to forms, job cards, etc, where the reading tasks are quite short yet self-contained.

Using these guidelines, a large number of items were constructed. Action and completion items used one reading passage each. There were three multiple-choice items per passage. The latter were much easier to write and more were constructed than the other types. This was not the only reason, however, as most content categories and passages can best be tested via this question-type. A number of passages were not amenable to an item of the appropriate type being written about them - due to brevity or format - and rather than exclude them, one of the other types was used. At the stage of item construction, this was seen as legitimate as invalid items would be excluded at the next step, content validation (see Chapter 9).

The real, underlying, guidelines are best explained, however, by reference to Murphy (1973), who gives the following instructions to his item writers in the Adult Functional Reading Study: ’’Each task must look

real. If the stimulus is to be a medecine label, wherever possible

obtain an actual label rather than merely typing the text onto a separate

piece of paper. The same holds true for pamphlets, forms, contracts,

newspaper ads etc. The task must copy faithfully as possible the real world of reading. As a task writer, one of your major concerns will be the face validity of the materials you produce. They must ’look re a l ’ , have some evident benefit to the respondent and be directly related to the kinds of reading most people do ... Remember that the difficulty of the task is to be a function of the stimulus material, not the questions we ask about it" (p. 52 - 53).

Examples of questions and reading passages (an item is one question and its associated passage) can be seen in the Functional Re-ading Test, Form A, given as an Appendix to this work (Appendix VII). Questions 1 to

6

inclusive are action-type; question 7 is a completion item; and questions

8

to 31 are multiple-choice.

CHAPTER 9